Detailed Comparison

A comprehensive breakdown of how SourcePrep's architecture compares to other AI coding tools and context engines.

Architecture

Graph Construction
How the codebase is parsed and understood
SourcePrep
Native Rust Engine (Tree-sitter)

SourcePrep's Rust-native parser uses Tree-sitter to build a complete structural trace graph offline. Unlike tools that depend on an active IDE or LSP server, SourcePrep works headlessly — in CI/CD, on servers, or anywhere Rust runs. The parser handles 15+ languages and produces call-graph, import, and containment edges in a single pass.

GitNexus
Node.js /WASM
GitNexus uses a Node.js/WASM architecture with Tree-sitter running in JavaScript. This works well for smaller repos, and their browser-based WASM option is genuinely innovative — zero installation needed. However, the Node.js runtime adds overhead for large codebases, and the browser sandbox limits memory. SourcePrep's native Rust engine is significantly faster for repos over 10K files.
Serena
Active LSP Server
Serena delegates all parsing to an active Language Server running in your IDE. This gives perfect type-resolved accuracy when the LSP is available — genuinely better than static analysis for type inference. However, it fails when the server isn't running, isn't configured for your language, or in headless environments. SourcePrep's offline Rust parser works without any running IDE process and produces a persistent graph that survives restarts.
Understand-Anything
LLM Multi-Agent Pipeline
Understand-Anything builds its graph through six specialized LLM agents (project-scanner, file-analyzer, architecture-analyzer, tour-builder, graph-reviewer, domain-analyzer) rather than a static parser. Using the LLM as the parser scales to many languages without per-language tooling — a genuinely clever distribution strategy. The tradeoff is that every full refresh costs LLM calls and carries hallucination risk. SourcePrep's deterministic Rust + Tree-sitter graph is reproducible, free to rebuild, and doesn't drift between runs.
bloop
Rust (Tree-sitter)
bloop also uses a Rust-native Tree-sitter parser, matching SourcePrep's parsing quality and speed. Their AST analysis is solid and well-engineered — credit where it's due. Where SourcePrep differentiates is in what happens after parsing: SourcePrep enriches the graph with LLM-inferred edges, epistemic understanding scores, and continuous deep analysis that evolves over time.
Grepai
Text Index
Grepai builds a basic text index for semantic search but doesn't parse code structure at all. There are no call-graph edges, no containment relationships, and no module boundaries. It's a powerful search tool, but SourcePrep provides full structural understanding on top of semantic search.
Empirica
Git Notes /No Graph
Empirica doesn't build a code graph at all. It focuses on epistemic state tracking via git notes — a fundamentally different philosophy. This is powerful for agent coordination and tracking what the AI thinks it knows, but provides no structural understanding of how the codebase is organized. SourcePrep combines structural graph analysis with epistemic enrichment, giving you both.
Vexp
SQLite /Tree-sitter
Vexp builds an AST graph using Tree-sitter stored in SQLite. This is a solid, well-engineered approach — fast for moderate repos and tightly integrated with VS Code. However, Vexp is locked to VS Code as its distribution mechanism. SourcePrep's standalone daemon works with any editor via MCP, can run headlessly for CI/CD team builds, and enriches the graph with LLM-inferred edges beyond what static parsing provides.
Search Architecture
How relevant context is found
SourcePrep
Local ONNX Embeddings + BM25

SourcePrep combines local ONNX embeddings (nomic-embed-text-v1.5) with BM25 keyword search in a hybrid architecture. Semantic search handles conceptual queries ('find the authentication flow') while BM25 catches exact identifiers ('handleLogin'). Everything runs 100% locally with no cloud dependency — embedding latency is ~7ms per query. Intent-aware routing automatically picks the best strategy per query.

GitNexus
KuzuDB /FTS
GitNexus uses KuzuDB (an embedded graph database with vector support) and full-text search. This is a capable architecture, especially strong for graph traversal queries like 'what calls this function?' GitNexus deserves credit for integrating graph-native vector search. SourcePrep's advantage is the hybrid BM25+ONNX approach with intent-aware routing that automatically detects whether a query needs semantic, structural, or trace-based search.
Serena
LSP Queries
Serena queries the running Language Server for symbol lookups (find_symbol, find_references). This gives perfect accuracy for structured queries but cannot handle natural language or conceptual searches. You can ask 'find all callers of handleLogin' but not 'find the authentication flow.' SourcePrep handles both structured and natural language queries.
Understand-Anything
Fuzzy + Semantic
Understand-Anything supports fuzzy and semantic search across its node graph, which handles both literal lookups and conceptual queries through a clean slash-command UX. Where SourcePrep differs: hybrid BM25 + ONNX scoring with intent-aware routing automatically picks the right strategy per query, and trace-based expansion includes callers, callees, and module context for any matched node — Understand-Anything returns nodes, not relationship-expanded context.
bloop
Local Qdrant /Vector
bloop uses Qdrant (a local vector database) for semantic search. Their approach is well-engineered and handles semantic queries effectively. SourcePrep's advantage is the hybrid BM25+ONNX approach combined with intent-aware routing and trace-based expansion — when a function is found, SourcePrep automatically includes its callers, callees, and module context.
Grepai
Local Semantic Index
Grepai provides solid local semantic search with a privacy-first local embedding index. It handles natural language queries well and has clean MCP integration. However, it's purely a search tool — no graph traversal, no trace expansion, no module-aware routing. SourcePrep layers semantic search on top of a full trace graph that understands call relationships and module boundaries.
Empirica
Git Commit Hashes
Empirica doesn't provide code search. It references code via git commit hashes and file paths — its purpose is tracking the agent's epistemic state, not finding relevant code. These are complementary concerns, not competing ones.
Vexp
FTS5 + TF-IDF (No Embeddings)
Vexp explicitly avoids embeddings, using FTS5 + TF-IDF + graph centrality instead. They position this as faster and simpler, and for exact keyword matches, it works very well. But TF-IDF fundamentally cannot match 'authentication' to 'login' — it only finds literal string overlaps. SourcePrep's ONNX embeddings handle conceptual similarity while still being fully local and completing in under 10ms.

Context Assembly

Context Delivery
What the AI actually receives
SourcePrep
LOD Capsule Context

SourcePrep delivers LOD capsule context: full source for focal nodes, signatures+docstrings for adjacent nodes, and module summaries for distant context. This gives the AI a natural zoom-in/zoom-out perspective that mirrors how human developers understand code. The result is rich, structured context that maximizes signal per token.

GitNexus
Precomputed Raw Graph Data
GitNexus precomputes clusters and execution flows, then returns the raw graph data. This is more structured than sending raw files — the AI gets relational context instead of flat text. However, the AI still needs to parse the graph relationships itself. SourcePrep pre-assembles the context into human-readable capsules so the AI doesn't waste tokens interpreting graph structure.
Serena
Raw Symbol Matches
Serena returns raw symbol definitions and references from the LSP. These are accurate but uncompressed — you get the full function body, all references, with no prioritization or level-of-detail control. SourcePrep's LOD compression ensures the AI receives the right level of detail for each piece of context based on its distance from the focal point.
Understand-Anything
JSON Graph + Slash Commands
Output is a committable .understand-anything/knowledge-graph.json artifact that agents query through slash commands (/understand-explain, /understand-chat, /understand-onboard). Per-node plain-English summaries are a real strength for human readability. The graph itself is the deliverable — there's no per-query LOD assembly that compresses adjacent nodes to signatures and distant nodes to module summaries the way SourcePrep's capsule does.
bloop
Raw Snippets
bloop returns raw code snippets matching the search query. The snippets are accurate and include surrounding context lines for readability, which is a nice touch. However, they lack structural context — there's no information about callers, imports, or module relationships that would help the AI understand how the code fits into the larger system.
Grepai
Raw File Chunks
Grepai returns raw file chunks matching the search query. There's no structural awareness, no LOD compression, and no context about how the matched code relates to the rest of the codebase. The search quality is good, but the delivery format wastes tokens on irrelevant surrounding code.
Empirica
Reasoning Checkpoints
Empirica delivers epistemic reasoning checkpoints — what the agent knew, what it learned, what changed. This is valuable for agent coordination but is orthogonal to code context delivery. It tells the agent about its own state, not about the codebase structure. Both types of context are useful; SourcePrep focuses on the code side.
Vexp
Capsule Context
Vexp implements capsule context very similarly to SourcePrep — full source for pivot nodes, signatures for neighbors. Credit where it's due: this is one of the closest approaches to SourcePrep's LOD system and validates the core idea. The difference is SourcePrep's dual-engine compression (LOD for code, LLMLingua-2 for docs) and module-summary injection, which provide additional layers of context beyond what Vexp includes, plus SourcePrep's dashboard lets you visually inspect the assembled capsule before it's sent.
Token Efficiency
Minimizing distractor tokens
SourcePrep
Dual-Engine Compression (3–20x)

SourcePrep achieves 3–20x token compression through a dual-engine approach: LOD-based structural compression for code (signatures instead of full bodies) and LLMLingua-2 token pruning for documentation (~2.4×). The compression level adapts dynamically per query and per client tier — Claude/Gemini get more full-source files, local models get tighter compression to fit constrained windows.

GitNexus
High (via Precomputation)
GitNexus achieves high efficiency through precomputation — complex graph queries are resolved before the AI asks, so the response is already focused. This is a legitimate efficiency win that we respect. However, the precomputed responses are static and can't adapt their compression level based on the specific query. SourcePrep dynamically adjusts LOD per query, compressing more aggressively for broad questions and less for targeted ones.
Serena
Low (Full Symbols)
Serena returns full symbol bodies from the LSP. A single find_references call can return thousands of tokens of raw code. There's no compression, prioritization, or level-of-detail control.
Understand-Anything
Plain-English Node Summaries
Each node carries a natural-language summary, which is itself a form of compression — the agent reads prose instead of full source. Well-suited for the explain/onboard use cases the tool emphasizes. No dual-engine compression, no per-query LOD adjustment, and no client-tier-aware budgets that scale context to the model's window.
bloop
Low (Full snippets)
bloop sends full code snippets with surrounding context. This is helpful for readability but increases token count significantly. There's no structural compression or level-of-detail control.
Grepai
Low (Sends full chunks)
Grepai sends full file chunks matching the search. No compression, no structural awareness of what parts of the chunk are relevant to the query.
Empirica
Low (State Dumps)
Empirica's epistemic state dumps can be verbose — serialized reasoning chains and pre/postflight checkpoints aren't optimized for token budgets. The content is high-value but the format isn't compressed.
Vexp
High (Signature Only)
Vexp achieves good efficiency by returning only signatures for non-focal nodes. This is the same core strategy as SourcePrep's LOD system, and it works well. Vexp's compression is query-adaptive and effective. SourcePrep's additional edge comes from dual-engine compression (LOD for code, LLMLingua-2 for docs), module-summary injection, tier-adaptive LOD thresholds, and the BM25+semantic scoring that better prioritizes which nodes to include at all.

Epistemology & Trust

LLM Augmentation
How AI deepens the knowledge graph
SourcePrep
Flexible AI Pipeline (Cloud BYOK or Local)

SourcePrep uses local or bring-your-own-key LLMs to continuously augment the structural trace graph with deep semantic understanding. The pipeline generates module summaries, infers cross-module relationships, computes understanding scores, and validates edge correctness — all automatically. This is not simple indexing: it's a multi-stage epistemic enrichment process where each pass deepens the AI's comprehension. You can run it with a local Ollama model for zero-cloud privacy, or use your own OpenAI/Anthropic key for maximum quality.

GitNexus
None (Static Graph)
GitNexus builds a structural graph using Tree-sitter and KuzuDB but does not use any LLM to augment or enrich it. The graph captures syntactic relationships (calls, imports, containment) but has no semantic understanding of what the code does, why modules exist, or how concepts relate across boundaries.
Serena
None (LSP Only)
Serena relies entirely on the Language Server Protocol for code understanding. No LLM is used to augment or enrich the data. The accuracy is limited to what the LSP can provide — type information and symbol references — with no semantic layer on top.
Understand-Anything
LLM-First (6 Agents)
Understand-Anything goes further than any other tool here: LLMs aren't an enrichment layer, they are the index. Six specialized agents perform the parsing, architecture detection, tour generation, and review. The depth of LLM-derived semantic understanding is genuinely impressive. The structural tradeoff is that there's no deterministic ground truth underneath — SourcePrep keeps a free, reproducible Rust graph and lets LLMs add concepts and rationale on top, so you can run with zero LLM budget if you choose.
bloop
None (Index Only)
bloop builds a vector index for search but does not use LLMs to augment the index with semantic understanding. The search is effective for finding code, but there's no deeper comprehension layer — no module summaries, no relationship inference, no understanding scores.
Grepai
None (Embeddings Only)
Grepai uses embedding models for semantic search but does not use LLMs to augment or enrich a knowledge graph. There's no epistemic pipeline, no module summarization, and no relationship inference. The embeddings enable similarity search but don't build understanding.
Empirica
Agent-Driven LLM Assessment
Empirica uses LLM calls during its pre/postflight epistemic assessments — agents evaluate their own knowledge before and after tasks. This is a form of LLM augmentation, but it's focused on the agent's self-awareness rather than enriching a code knowledge graph. It doesn't generate module summaries or infer structural relationships. SourcePrep's approach augments the graph itself, while Empirica augments the agent's understanding of its own state.
Vexp
None (Static AST)
Vexp's graph is purely structural — built from Tree-sitter AST analysis and FTS5 indexing. There is no LLM augmentation step. The system understands code structure but not code meaning. Agent-written observations can add some semantic context, but this is manual and agent-driven, not an automated enrichment pipeline.
Continuous Enrichment
Refining understanding over time
SourcePrep
Trace Epistemology Pipeline

SourcePrep's Trace Epistemology Pipeline continuously enriches the knowledge graph: deep analysis generates module summaries, cross-module relationship analysis, and understanding scores. Each pipeline run builds on previous results, and the file watcher triggers incremental re-enrichment when code changes. The result is a knowledge base that gets measurably smarter over time — visible in the dashboard's health scores.

GitNexus
Static until re-indexed
GitNexus builds its graph once and serves it statically until explicitly re-indexed. There's no continuous learning or enrichment between builds. For stable codebases this is fine, but for active development, the index quickly becomes stale.
Serena
None
Serena provides no enrichment. It queries the LSP in real-time and returns results. There's no persistent knowledge accumulation between sessions.
Understand-Anything
Incremental LLM Re-analysis
Supports incremental updates — only changed files get re-analyzed by the agent pipeline, with an optional --auto-update post-commit hook. Sensible design for keeping the graph fresh. Each refresh still costs LLM calls; SourcePrep's enrichment pipeline can run on local Ollama at zero cloud cost and persists understanding scores per module that improve across runs.
bloop
None
bloop rebuilds its index from scratch. No continuous enrichment or persistent learning between index builds.
Grepai
None
Grepai rebuilds its index from scratch on each run. No continuous enrichment or persistent learning.
Empirica
Git-Native Pre/Postflight
Empirica genuinely excels here. Its pre/postflight system has agents assess their knowledge before and after tasks, storing these assessments in git notes for version-controlled epistemic continuity. This creates real cross-session learning. SourcePrep's pipeline is more automated (no agent involvement needed) and produces structured graph enrichment, but Empirica's approach to tracking what the agent thinks it knows is innovative and we tip our hat to it.
Vexp
Session Memory
Vexp supports session memory — agents can save observations attached to graph nodes, and these persist across sessions. This is a thoughtful feature that enables incremental learning. However, it relies entirely on the agent to drive enrichment by writing good observations. SourcePrep's pipeline runs automatically in the background with no agent involvement needed, producing structured module summaries and understanding scores.
Drift Detection
Knowing when agent assumptions are stale
SourcePrep
Automated via Watcher & Graph

SourcePrep's file watcher monitors the codebase for changes and automatically marks affected trace nodes, observations, and enrichment data as stale. When a function changes, all observations about that function are flagged. The dashboard shows drift status at a glance with per-node granularity. No manual intervention needed.

GitNexus
Manual git-diff checks
GitNexus can detect changes via git-diff but requires manual re-indexing to update the knowledge graph. There's no automatic staleness tracking for individual nodes — the entire index is either current or it isn't.
Serena
None
Serena has no drift detection. It queries the LSP live, so in theory results are always current — but it has no concept of tracking what changed or what assumptions from previous sessions might be stale.
Understand-Anything
Diff Impact (on demand)
/understand-diff analyzes which parts of the system are affected by a change — a real and useful feature for reviewing PRs. It runs when invoked rather than continuously flagging stale nodes after each edit. SourcePrep's file watcher marks individual nodes and observations stale automatically the moment their underlying code changes.
bloop
None
No drift detection. The index must be manually rebuilt when code changes.
Grepai
None
No drift detection. The index must be manually rebuilt when code changes.
Empirica
Mirror Drift Detection
Empirica's Mirror Drift Detection is genuinely strong. It tracks capability drops and knowledge degradation across sessions, alerting when the agent's understanding has become unreliable. This is one of Empirica's best features — they focus deeply on epistemic reliability. SourcePrep's approach is more granular (per-node vs per-session) and more visual (dashboard vs git log), but Empirica deserves real credit for pioneering this concept.
Vexp
Manual Observation Staling
Vexp marks observations as stale when their linked nodes change — a correct and useful approach. However, this only works for nodes that have agent-written observations attached. There's no automatic detection of semantic drift in the broader graph for nodes without observations.
Inspectability
Seeing what the AI sees
SourcePrep
Dedicated Desktop Health Dashboard

SourcePrep's dedicated desktop dashboard lets you visually browse the trace graph, see module health scores, inspect enrichment pipeline status, and fine-tune scope with a folder tree. You can see exactly what context the AI will receive before it receives it. This bird's-eye perspective of your codebase builds trust and gives developers real control over the AI's knowledge.

GitNexus
Web UI /Terminal
GitNexus offers a web UI and terminal interface for browsing the graph. The web UI is functional and shows precomputed clusters and wiki documentation. It's less purpose-built for context-inspection than SourcePrep's dashboard but provides reasonable visibility into the knowledge graph.
Serena
Opaque
Serena is largely opaque. The MCP tools execute and return results, but there's no interface to see what the system 'knows,' how it's reasoning about the codebase, or what context it would assemble for a given query.
Understand-Anything
Interactive Dashboard + Demo
Understand-Anything has the strongest visualization story of any tool on this page: a polished interactive dashboard with force-directed graphs, automatic layer view, domain view, and a public live demo at understand-anything.com/demo with a committed reference graph. Credit where it's due — this is the bar for exploratory navigation. SourcePrep's dashboard is purpose-built for a different question: graph health, enrichment status, and scope control rather than free-form graph exploration.
bloop
Desktop App
bloop has a dedicated desktop app with a polished code search UI. Credit where it's due — bloop's search interface is clean, fast, and pleasant to use. However, it focuses on search results rather than graph health, enrichment status, or context assembly inspection. SourcePrep's dashboard is specifically built for understanding and controlling the AI's knowledge, not just searching code.
Grepai
Terminal Only
Grepai is a CLI tool — terminal output only. You can see search results but there's no way to visualize the index, understand coverage gaps, or inspect what context would be assembled.
Empirica
Git Log Only
Empirica stores everything in git notes, viewable via git log. This is maximally transparent — everything is version-controlled and auditable, which is admirable. But it requires git expertise to inspect and there's no visual dashboard for at-a-glance understanding of the epistemic state.
Vexp
VS Code Only
Vexp operates as a VS Code extension with in-editor views. You can see the graph within VS Code, which is convenient and well-integrated. However, it's limited to VS Code users and doesn't offer the birds-eye project health view with health scores, scope management, and enrichment pipeline monitoring that SourcePrep's standalone dashboard provides.

Control & Customization

Scope Management
Controlling what the AI can see
SourcePrep
Visual Folder-Tree with Include/Exclude

SourcePrep provides a visual folder-tree in the dashboard for precise scope control. Include or exclude entire directories, individual files, or use glob patterns. Changes take effect immediately and the dashboard shows exactly which files are in-scope, how many nodes are indexed, and what percentage of the codebase is covered. This gives developers fine-grained control over the AI's view of the project.

GitNexus
.gitignore-style Patterns
GitNexus uses .gitignore-style patterns for scope control. This is functional and familiar to developers, but there's no visual interface — you edit config files directly. You can't easily see at a glance which files are included or excluded, or what percentage of your codebase is covered.
Serena
LSP Workspace Scope
Serena scopes to whatever the LSP can see. There's no independent scope configuration. If the Language Server indexes it, Serena can query it; if not, it can't.
Understand-Anything
Auto Project Detection
The project-scanner agent auto-detects files, languages, and frameworks. Knowledge-base mode accepts an explicit path argument. There's no visual include/exclude tree exposed for fine-grained scope control within a repo — scope is implicit in what the scanner finds, with no way to focus the AI on a specific subsystem of a large monorepo.
bloop
Repo-Level Selection
bloop lets you choose which repositories to index. This is scope control at the repo level, which is useful for multi-repo setups. However, there's no file-level or folder-level control within a repo, and no visual tree for fine-tuning what's included.
Grepai
CLI Path Arguments
Grepai accepts path arguments on the command line. This is basic but functional for one-off searches. There's no persistent scope configuration or visual management.
Empirica
Git Repo Scope Only
Empirica scopes to the entire git repository. There's no fine-grained file or folder control. This makes sense for its epistemic-tracking purpose but doesn't allow developers to focus the AI on specific areas of a large monorepo.
Vexp
VS Code Workspace Scope
Vexp scopes to the VS Code workspace and supports include/exclude patterns in settings. This is adequate for single-workspace projects. However, there's no visual tree view for managing scope, and the settings are buried in VS Code's configuration UI rather than being front-and-center in a purpose-built dashboard.
Edge & Module Weighting
Prioritizing what matters most in the graph
SourcePrep
Configurable Edge Weights + Module Importance

SourcePrep assigns edge weights by kind (call, import, containment, inferred, LSP) that affect trace expansion priority. Module importance scores from the enrichment pipeline influence which context gets included first when token budgets are tight. The dashboard exposes these weights, letting developers fine-tune how the graph prioritizes different parts of the codebase — for example, boosting your core business logic over utility helpers.

GitNexus
Graph Centrality Metrics
GitNexus uses graph centrality metrics to rank nodes in its precomputed clusters. This implicitly weights important hub files higher. It's an automated, sensible approach. However, there are no user-facing controls to override the heuristics — you can't tell the system that your 'auth' module matters more than your 'utils' module.
Serena
No Ranking
Serena returns LSP results without ranking or weighting. All symbols are treated equally — the response to 'find references' includes every reference with no prioritization by importance.
Understand-Anything
Auto Layer Grouping
The architecture-analyzer agent automatically groups files into architectural layers (API / Service / Data / UI / Utility), and tour-builder orders learning walkthroughs by dependency. This is implicit weighting baked into the agent prompts. No user-facing controls to override the LLM's layer assignments or boost specific modules over others.
bloop
Vector Similarity Only
bloop ranks results by vector similarity. The search is effective but there's no graph-based weighting or user-configurable prioritization of modules or file groups.
Grepai
Embedding Similarity Only
Grepai ranks results by embedding similarity score only. There's no structural weighting, no graph-based prioritization, and no way to influence ranking beyond the query text.
Empirica
N/A
Empirica doesn't model code structure, so graph weighting isn't applicable to its approach. Its focus is on the agent's epistemic state, not code topology.
Vexp
Graph Centrality in Ranking
Vexp incorporates graph centrality into its search ranking. Similar concept to SourcePrep's edge weights but not user-configurable. The ranking is purely algorithmic with no developer input on priorities.
Privacy & Local-First
Where your code data lives
SourcePrep
100% Local: Rust + ONNX Zero Cloud

Everything in SourcePrep runs 100% locally. The Rust parser, ONNX embeddings (nomic-embed-text-v1.5), SQLite storage, and the dashboard all work fully offline. No code ever leaves your machine unless you explicitly configure team sync to your own S3 bucket. The ONNX runtime embeds at ~7ms per query with zero cloud dependencies, zero API keys, and zero data transmission.

GitNexus
Local (Node.js + WASM option)
GitNexus runs locally via Node.js CLI and offers an innovative browser-based WASM option that requires zero installation. Both modes are fully offline. Their WASM approach means you can even run it in a sandboxed browser tab. SourcePrep's native Rust engine is faster for large codebases, but GitNexus's zero-install browser option is a genuinely clever distribution strategy.
Serena
Local Server (LLM calls needed)
Serena runs locally as an MCP server, querying the local LSP. The tool itself is private. However, it's designed to be used with cloud-hosted LLMs, so code context inevitably flows to the model provider when the agent uses Serena's results.
Understand-Anything
Local Artifact, LLM Calls Required
The output graph stays on disk as a local JSON file, but the six-agent indexing pipeline requires LLM calls — code is sent to whichever provider you configure. There's no ONNX-equivalent local model path for the build itself. SourcePrep's full local-first story (Rust parser + ONNX embeddings + zero cloud calls by default) is structurally different.
bloop
Local (Qdrant Instance)
bloop runs locally with its own Qdrant vector database instance. Fully offline capable with a good privacy story. Comparable to SourcePrep's local-first approach.
Grepai
Privacy-First Local
Grepai is explicitly privacy-first with local embeddings. Strong privacy story, comparable to SourcePrep's approach. Both tools keep everything on-device with zero cloud dependencies for the core functionality.
Empirica
Git-Native (LLM calls needed)
Empirica stores everything in git notes — maximally local and version-controlled, which is excellent. However, the pre/postflight epistemic assessments require LLM calls, which means code context may be sent to cloud providers depending on configuration. The storage layer is private but the reasoning layer may not be.
Vexp
Local (VS Code Extension)
Vexp is fully local-first, running entirely within VS Code. No cloud calls. SQLite storage stays on disk. Strong privacy story — comparable to SourcePrep's approach. The main difference is SourcePrep works across any IDE via MCP and can run headlessly.

Companion, not a framework

SourcePrep isn't trying to replace your favorite tools. It's an MCP-native context engine designed to supercharge the AI IDEs and agents you already use.

Cursor

AI IDE

Connect SourcePrep via MCP to give Cursor perfect project-wide context without copying files.

Windsurf

AI IDE

Windsurf agents use SourcePrep to autonomously navigate the trace graph before editing.

Cline / Roo

VS Code Extension

Stop dropping raw files into context. Give your agent the SourcePrep LOD capsule instead.

Claude Code

CLI Agent

Supercharge Anthropic's CLI with blazing-fast local ONNX semantic routing.