Detailed Comparison

A comprehensive breakdown of how SourcePrep's architecture compares to other AI coding tools and context engines.

	Feature Comparison	SourcePrep Continuous Graph RAG
Architecture	Graph Construction How the codebase is parsed and understood	Native Rust Engine(Tree-sitter)
Architecture	Search Architecture How relevant context is found	Local ONNX Embeddings+ BM25
Context Assembly	Context Delivery What the AI actually receives	LOD CapsuleContext
Context Assembly	Token Efficiency Minimizing distractor tokens	Dual-EngineCompression (3–20x)
Epistemology & Trust	LLM Augmentation How AI deepens the knowledge graph	Flexible AI Pipeline(Cloud BYOK or Local)
	Continuous Enrichment Refining understanding over time	Trace EpistemologyPipeline
	Drift Detection Knowing when agent assumptions are stale	Automated viaWatcher & Graph
	Inspectability Seeing what the AI sees	Dedicated DesktopHealth Dashboard
Control & Customization	Scope Management Controlling what the AI can see	Visual Folder-Treewith Include/Exclude
	Edge & Module Weighting Prioritizing what matters most in the graph	Configurable Edge Weights+ Module Importance
	Privacy & Local-First Where your code data lives	100% Local: Rust + ONNXZero Cloud

GitNexus Precomputed RAG	Serena LSP Agent Toolkit	Understand-Anything LLM Knowledge Graph	bloop AST Search Tools	Grepai CLI Semantic Search	Empirica Epistemic Agents	Vexp AST Context Engine
Node.js/WASM	ActiveLSP Server	LLM Multi-AgentPipeline	Rust(Tree-sitter)	Text Index	Git Notes/No Graph	SQLite/Tree-sitter
KuzuDB/FTS	LSPQueries	Fuzzy +Semantic	Local Qdrant/Vector	LocalSemantic Index	Git CommitHashes	FTS5 + TF-IDF(No Embeddings)
PrecomputedRaw Graph Data	RawSymbol Matches	JSON Graph+ Slash Commands	RawSnippets	RawFile Chunks	ReasoningCheckpoints	CapsuleContext
High(via Precomputation)	Low(Full Symbols)	Plain-EnglishNode Summaries	Low(Full snippets)	Low(Sends full chunks)	Low(State Dumps)	High(Signature Only)
None(Static Graph)	None(LSP Only)	LLM-First(6 Agents)	None(Index Only)	None(Embeddings Only)	Agent-DrivenLLM Assessment	None(Static AST)
Staticuntil re-indexed	None	IncrementalLLM Re-analysis	None	None	Git-NativePre/Postflight	SessionMemory
Manualgit-diff checks	None	Diff Impact(on demand)	None	None	MirrorDrift Detection	ManualObservation Staling
Web UI/Terminal	Opaque	InteractiveDashboard + Demo	DesktopApp	TerminalOnly	Git LogOnly	VS CodeOnly
.gitignore-stylePatterns	LSPWorkspace Scope	Auto ProjectDetection	Repo-LevelSelection	CLIPath Arguments	Git RepoScope Only	VS CodeWorkspace Scope
Graph CentralityMetrics	NoRanking	Auto LayerGrouping	Vector SimilarityOnly	Embedding SimilarityOnly	N/A	Graph Centralityin Ranking
Local(Node.js + WASM option)	Local Server(LLM calls needed)	Local Artifact,LLM Calls Required	Local(Qdrant Instance)	Privacy-FirstLocal	Git-Native(LLM calls needed)	Local(VS Code Extension)

Architecture

Graph Construction

How the codebase is parsed and understood

SourcePrep

Native Rust Engine (Tree-sitter)

SourcePrep's Rust-native parser uses Tree-sitter to build a complete structural trace graph offline. Unlike tools that depend on an active IDE or LSP server, SourcePrep works headlessly — in CI/CD, on servers, or anywhere Rust runs. The parser handles 15+ languages and produces call-graph, import, and containment edges in a single pass.

GitNexus

Node.js /WASM

GitNexus uses a Node.js/WASM architecture with Tree-sitter running in JavaScript. This works well for smaller repos, and their browser-based WASM option is genuinely innovative — zero installation needed. However, the Node.js runtime adds overhead for large codebases, and the browser sandbox limits memory. SourcePrep's native Rust engine is significantly faster for repos over 10K files.

Serena

Active LSP Server

Serena delegates all parsing to an active Language Server running in your IDE. This gives perfect type-resolved accuracy when the LSP is available — genuinely better than static analysis for type inference. However, it fails when the server isn't running, isn't configured for your language, or in headless environments. SourcePrep's offline Rust parser works without any running IDE process and produces a persistent graph that survives restarts.

Understand-Anything

LLM Multi-Agent Pipeline

Understand-Anything builds its graph through six specialized LLM agents (project-scanner, file-analyzer, architecture-analyzer, tour-builder, graph-reviewer, domain-analyzer) rather than a static parser. Using the LLM as the parser scales to many languages without per-language tooling — a genuinely clever distribution strategy. The tradeoff is that every full refresh costs LLM calls and carries hallucination risk. SourcePrep's deterministic Rust + Tree-sitter graph is reproducible, free to rebuild, and doesn't drift between runs.

bloop

Rust (Tree-sitter)

bloop also uses a Rust-native Tree-sitter parser, matching SourcePrep's parsing quality and speed. Their AST analysis is solid and well-engineered — credit where it's due. Where SourcePrep differentiates is in what happens after parsing: SourcePrep enriches the graph with LLM-inferred edges, epistemic understanding scores, and continuous deep analysis that evolves over time.

Grepai

Text Index

Grepai builds a basic text index for semantic search but doesn't parse code structure at all. There are no call-graph edges, no containment relationships, and no module boundaries. It's a powerful search tool, but SourcePrep provides full structural understanding on top of semantic search.

Empirica

Git Notes /No Graph

Empirica doesn't build a code graph at all. It focuses on epistemic state tracking via git notes — a fundamentally different philosophy. This is powerful for agent coordination and tracking what the AI thinks it knows, but provides no structural understanding of how the codebase is organized. SourcePrep combines structural graph analysis with epistemic enrichment, giving you both.

Vexp

SQLite /Tree-sitter

Vexp builds an AST graph using Tree-sitter stored in SQLite. This is a solid, well-engineered approach — fast for moderate repos and tightly integrated with VS Code. However, Vexp is locked to VS Code as its distribution mechanism. SourcePrep's standalone daemon works with any editor via MCP, can run headlessly for CI/CD team builds, and enriches the graph with LLM-inferred edges beyond what static parsing provides.

Search Architecture

How relevant context is found

SourcePrep

Local ONNX Embeddings + BM25

SourcePrep combines local ONNX embeddings (nomic-embed-text-v1.5) with BM25 keyword search in a hybrid architecture. Semantic search handles conceptual queries ('find the authentication flow') while BM25 catches exact identifiers ('handleLogin'). Everything runs 100% locally with no cloud dependency — embedding latency is ~7ms per query. Intent-aware routing automatically picks the best strategy per query.

GitNexus

KuzuDB /FTS

GitNexus uses KuzuDB (an embedded graph database with vector support) and full-text search. This is a capable architecture, especially strong for graph traversal queries like 'what calls this function?' GitNexus deserves credit for integrating graph-native vector search. SourcePrep's advantage is the hybrid BM25+ONNX approach with intent-aware routing that automatically detects whether a query needs semantic, structural, or trace-based search.

Serena

LSP Queries

Serena queries the running Language Server for symbol lookups (find_symbol, find_references). This gives perfect accuracy for structured queries but cannot handle natural language or conceptual searches. You can ask 'find all callers of handleLogin' but not 'find the authentication flow.' SourcePrep handles both structured and natural language queries.

Understand-Anything

Fuzzy + Semantic

Understand-Anything supports fuzzy and semantic search across its node graph, which handles both literal lookups and conceptual queries through a clean slash-command UX. Where SourcePrep differs: hybrid BM25 + ONNX scoring with intent-aware routing automatically picks the right strategy per query, and trace-based expansion includes callers, callees, and module context for any matched node — Understand-Anything returns nodes, not relationship-expanded context.

bloop

Local Qdrant /Vector

bloop uses Qdrant (a local vector database) for semantic search. Their approach is well-engineered and handles semantic queries effectively. SourcePrep's advantage is the hybrid BM25+ONNX approach combined with intent-aware routing and trace-based expansion — when a function is found, SourcePrep automatically includes its callers, callees, and module context.

Grepai

Local Semantic Index

Grepai provides solid local semantic search with a privacy-first local embedding index. It handles natural language queries well and has clean MCP integration. However, it's purely a search tool — no graph traversal, no trace expansion, no module-aware routing. SourcePrep layers semantic search on top of a full trace graph that understands call relationships and module boundaries.

Empirica

Git Commit Hashes

Empirica doesn't provide code search. It references code via git commit hashes and file paths — its purpose is tracking the agent's epistemic state, not finding relevant code. These are complementary concerns, not competing ones.

Vexp

FTS5 + TF-IDF (No Embeddings)

Vexp explicitly avoids embeddings, using FTS5 + TF-IDF + graph centrality instead. They position this as faster and simpler, and for exact keyword matches, it works very well. But TF-IDF fundamentally cannot match 'authentication' to 'login' — it only finds literal string overlaps. SourcePrep's ONNX embeddings handle conceptual similarity while still being fully local and completing in under 10ms.

Context Assembly

Context Delivery

What the AI actually receives

SourcePrep

LOD Capsule Context

SourcePrep delivers LOD capsule context: full source for focal nodes, signatures+docstrings for adjacent nodes, and module summaries for distant context. This gives the AI a natural zoom-in/zoom-out perspective that mirrors how human developers understand code. The result is rich, structured context that maximizes signal per token.

GitNexus

Precomputed Raw Graph Data

GitNexus precomputes clusters and execution flows, then returns the raw graph data. This is more structured than sending raw files — the AI gets relational context instead of flat text. However, the AI still needs to parse the graph relationships itself. SourcePrep pre-assembles the context into human-readable capsules so the AI doesn't waste tokens interpreting graph structure.

Serena

Raw Symbol Matches

Serena returns raw symbol definitions and references from the LSP. These are accurate but uncompressed — you get the full function body, all references, with no prioritization or level-of-detail control. SourcePrep's LOD compression ensures the AI receives the right level of detail for each piece of context based on its distance from the focal point.

Understand-Anything

JSON Graph + Slash Commands

Output is a committable .understand-anything/knowledge-graph.json artifact that agents query through slash commands (/understand-explain, /understand-chat, /understand-onboard). Per-node plain-English summaries are a real strength for human readability. The graph itself is the deliverable — there's no per-query LOD assembly that compresses adjacent nodes to signatures and distant nodes to module summaries the way SourcePrep's capsule does.

bloop

Raw Snippets

bloop returns raw code snippets matching the search query. The snippets are accurate and include surrounding context lines for readability, which is a nice touch. However, they lack structural context — there's no information about callers, imports, or module relationships that would help the AI understand how the code fits into the larger system.

Grepai

Raw File Chunks

Grepai returns raw file chunks matching the search query. There's no structural awareness, no LOD compression, and no context about how the matched code relates to the rest of the codebase. The search quality is good, but the delivery format wastes tokens on irrelevant surrounding code.

Empirica

Reasoning Checkpoints

Empirica delivers epistemic reasoning checkpoints — what the agent knew, what it learned, what changed. This is valuable for agent coordination but is orthogonal to code context delivery. It tells the agent about its own state, not about the codebase structure. Both types of context are useful; SourcePrep focuses on the code side.

Vexp

Capsule Context

Vexp implements capsule context very similarly to SourcePrep — full source for pivot nodes, signatures for neighbors. Credit where it's due: this is one of the closest approaches to SourcePrep's LOD system and validates the core idea. The difference is SourcePrep's dual-engine compression (LOD for code, LLMLingua-2 for docs) and module-summary injection, which provide additional layers of context beyond what Vexp includes, plus SourcePrep's dashboard lets you visually inspect the assembled capsule before it's sent.

Token Efficiency

Minimizing distractor tokens

SourcePrep

Dual-Engine Compression (3–20x)

SourcePrep achieves 3–20x token compression through a dual-engine approach: LOD-based structural compression for code (signatures instead of full bodies) and LLMLingua-2 token pruning for documentation (~2.4×). The compression level adapts dynamically per query and per client tier — Claude/Gemini get more full-source files, local models get tighter compression to fit constrained windows.

GitNexus

High (via Precomputation)

GitNexus achieves high efficiency through precomputation — complex graph queries are resolved before the AI asks, so the response is already focused. This is a legitimate efficiency win that we respect. However, the precomputed responses are static and can't adapt their compression level based on the specific query. SourcePrep dynamically adjusts LOD per query, compressing more aggressively for broad questions and less for targeted ones.

Serena

Low (Full Symbols)

Serena returns full symbol bodies from the LSP. A single find_references call can return thousands of tokens of raw code. There's no compression, prioritization, or level-of-detail control.

Understand-Anything

Plain-English Node Summaries

Each node carries a natural-language summary, which is itself a form of compression — the agent reads prose instead of full source. Well-suited for the explain/onboard use cases the tool emphasizes. No dual-engine compression, no per-query LOD adjustment, and no client-tier-aware budgets that scale context to the model's window.

bloop

Low (Full snippets)

bloop sends full code snippets with surrounding context. This is helpful for readability but increases token count significantly. There's no structural compression or level-of-detail control.

Grepai

Low (Sends full chunks)

Grepai sends full file chunks matching the search. No compression, no structural awareness of what parts of the chunk are relevant to the query.

Empirica

Low (State Dumps)

Empirica's epistemic state dumps can be verbose — serialized reasoning chains and pre/postflight checkpoints aren't optimized for token budgets. The content is high-value but the format isn't compressed.

Vexp

High (Signature Only)

Vexp achieves good efficiency by returning only signatures for non-focal nodes. This is the same core strategy as SourcePrep's LOD system, and it works well. Vexp's compression is query-adaptive and effective. SourcePrep's additional edge comes from dual-engine compression (LOD for code, LLMLingua-2 for docs), module-summary injection, tier-adaptive LOD thresholds, and the BM25+semantic scoring that better prioritizes which nodes to include at all.

Epistemology & Trust

LLM Augmentation

How AI deepens the knowledge graph

SourcePrep

Flexible AI Pipeline (Cloud BYOK or Local)

SourcePrep uses local or bring-your-own-key LLMs to continuously augment the structural trace graph with deep semantic understanding. The pipeline generates module summaries, infers cross-module relationships, computes understanding scores, and validates edge correctness — all automatically. This is not simple indexing: it's a multi-stage epistemic enrichment process where each pass deepens the AI's comprehension. You can run it with a local Ollama model for zero-cloud privacy, or use your own OpenAI/Anthropic key for maximum quality.

GitNexus

None (Static Graph)

GitNexus builds a structural graph using Tree-sitter and KuzuDB but does not use any LLM to augment or enrich it. The graph captures syntactic relationships (calls, imports, containment) but has no semantic understanding of what the code does, why modules exist, or how concepts relate across boundaries.

Serena

None (LSP Only)

Serena relies entirely on the Language Server Protocol for code understanding. No LLM is used to augment or enrich the data. The accuracy is limited to what the LSP can provide — type information and symbol references — with no semantic layer on top.

Understand-Anything

LLM-First (6 Agents)

Understand-Anything goes further than any other tool here: LLMs aren't an enrichment layer, they are the index. Six specialized agents perform the parsing, architecture detection, tour generation, and review. The depth of LLM-derived semantic understanding is genuinely impressive. The structural tradeoff is that there's no deterministic ground truth underneath — SourcePrep keeps a free, reproducible Rust graph and lets LLMs add concepts and rationale on top, so you can run with zero LLM budget if you choose.

bloop

None (Index Only)

bloop builds a vector index for search but does not use LLMs to augment the index with semantic understanding. The search is effective for finding code, but there's no deeper comprehension layer — no module summaries, no relationship inference, no understanding scores.

Grepai

None (Embeddings Only)

Grepai uses embedding models for semantic search but does not use LLMs to augment or enrich a knowledge graph. There's no epistemic pipeline, no module summarization, and no relationship inference. The embeddings enable similarity search but don't build understanding.

Empirica

Agent-Driven LLM Assessment

Empirica uses LLM calls during its pre/postflight epistemic assessments — agents evaluate their own knowledge before and after tasks. This is a form of LLM augmentation, but it's focused on the agent's self-awareness rather than enriching a code knowledge graph. It doesn't generate module summaries or infer structural relationships. SourcePrep's approach augments the graph itself, while Empirica augments the agent's understanding of its own state.

Vexp

None (Static AST)

Vexp's graph is purely structural — built from Tree-sitter AST analysis and FTS5 indexing. There is no LLM augmentation step. The system understands code structure but not code meaning. Agent-written observations can add some semantic context, but this is manual and agent-driven, not an automated enrichment pipeline.

Continuous Enrichment

Refining understanding over time

SourcePrep

Trace Epistemology Pipeline

SourcePrep's Trace Epistemology Pipeline continuously enriches the knowledge graph: deep analysis generates module summaries, cross-module relationship analysis, and understanding scores. Each pipeline run builds on previous results, and the file watcher triggers incremental re-enrichment when code changes. The result is a knowledge base that gets measurably smarter over time — visible in the dashboard's health scores.

GitNexus

Static until re-indexed

GitNexus builds its graph once and serves it statically until explicitly re-indexed. There's no continuous learning or enrichment between builds. For stable codebases this is fine, but for active development, the index quickly becomes stale.

Serena

None

Serena provides no enrichment. It queries the LSP in real-time and returns results. There's no persistent knowledge accumulation between sessions.

Understand-Anything

Incremental LLM Re-analysis

Supports incremental updates — only changed files get re-analyzed by the agent pipeline, with an optional --auto-update post-commit hook. Sensible design for keeping the graph fresh. Each refresh still costs LLM calls; SourcePrep's enrichment pipeline can run on local Ollama at zero cloud cost and persists understanding scores per module that improve across runs.

bloop

None

bloop rebuilds its index from scratch. No continuous enrichment or persistent learning between index builds.

Grepai

None

Grepai rebuilds its index from scratch on each run. No continuous enrichment or persistent learning.

Empirica

Git-Native Pre/Postflight

Empirica genuinely excels here. Its pre/postflight system has agents assess their knowledge before and after tasks, storing these assessments in git notes for version-controlled epistemic continuity. This creates real cross-session learning. SourcePrep's pipeline is more automated (no agent involvement needed) and produces structured graph enrichment, but Empirica's approach to tracking what the agent thinks it knows is innovative and we tip our hat to it.

Vexp

Session Memory

Vexp supports session memory — agents can save observations attached to graph nodes, and these persist across sessions. This is a thoughtful feature that enables incremental learning. However, it relies entirely on the agent to drive enrichment by writing good observations. SourcePrep's pipeline runs automatically in the background with no agent involvement needed, producing structured module summaries and understanding scores.

Drift Detection

Knowing when agent assumptions are stale

SourcePrep

Automated via Watcher & Graph

SourcePrep's file watcher monitors the codebase for changes and automatically marks affected trace nodes, observations, and enrichment data as stale. When a function changes, all observations about that function are flagged. The dashboard shows drift status at a glance with per-node granularity. No manual intervention needed.

GitNexus

Manual git-diff checks

GitNexus can detect changes via git-diff but requires manual re-indexing to update the knowledge graph. There's no automatic staleness tracking for individual nodes — the entire index is either current or it isn't.

Serena

None

Serena has no drift detection. It queries the LSP live, so in theory results are always current — but it has no concept of tracking what changed or what assumptions from previous sessions might be stale.

Understand-Anything

Diff Impact (on demand)

/understand-diff analyzes which parts of the system are affected by a change — a real and useful feature for reviewing PRs. It runs when invoked rather than continuously flagging stale nodes after each edit. SourcePrep's file watcher marks individual nodes and observations stale automatically the moment their underlying code changes.

bloop

None

No drift detection. The index must be manually rebuilt when code changes.

Grepai

None

No drift detection. The index must be manually rebuilt when code changes.

Empirica

Mirror Drift Detection

Empirica's Mirror Drift Detection is genuinely strong. It tracks capability drops and knowledge degradation across sessions, alerting when the agent's understanding has become unreliable. This is one of Empirica's best features — they focus deeply on epistemic reliability. SourcePrep's approach is more granular (per-node vs per-session) and more visual (dashboard vs git log), but Empirica deserves real credit for pioneering this concept.

Vexp

Manual Observation Staling

Vexp marks observations as stale when their linked nodes change — a correct and useful approach. However, this only works for nodes that have agent-written observations attached. There's no automatic detection of semantic drift in the broader graph for nodes without observations.

Inspectability

Seeing what the AI sees

SourcePrep

Dedicated Desktop Health Dashboard

SourcePrep's dedicated desktop dashboard lets you visually browse the trace graph, see module health scores, inspect enrichment pipeline status, and fine-tune scope with a folder tree. You can see exactly what context the AI will receive before it receives it. This bird's-eye perspective of your codebase builds trust and gives developers real control over the AI's knowledge.

GitNexus

Web UI /Terminal

GitNexus offers a web UI and terminal interface for browsing the graph. The web UI is functional and shows precomputed clusters and wiki documentation. It's less purpose-built for context-inspection than SourcePrep's dashboard but provides reasonable visibility into the knowledge graph.

Serena

Opaque

Serena is largely opaque. The MCP tools execute and return results, but there's no interface to see what the system 'knows,' how it's reasoning about the codebase, or what context it would assemble for a given query.

Understand-Anything

Interactive Dashboard + Demo

Understand-Anything has the strongest visualization story of any tool on this page: a polished interactive dashboard with force-directed graphs, automatic layer view, domain view, and a public live demo at understand-anything.com/demo with a committed reference graph. Credit where it's due — this is the bar for exploratory navigation. SourcePrep's dashboard is purpose-built for a different question: graph health, enrichment status, and scope control rather than free-form graph exploration.

bloop

Desktop App

bloop has a dedicated desktop app with a polished code search UI. Credit where it's due — bloop's search interface is clean, fast, and pleasant to use. However, it focuses on search results rather than graph health, enrichment status, or context assembly inspection. SourcePrep's dashboard is specifically built for understanding and controlling the AI's knowledge, not just searching code.

Grepai

Terminal Only

Grepai is a CLI tool — terminal output only. You can see search results but there's no way to visualize the index, understand coverage gaps, or inspect what context would be assembled.

Empirica

Git Log Only

Empirica stores everything in git notes, viewable via git log. This is maximally transparent — everything is version-controlled and auditable, which is admirable. But it requires git expertise to inspect and there's no visual dashboard for at-a-glance understanding of the epistemic state.

Vexp

VS Code Only

Vexp operates as a VS Code extension with in-editor views. You can see the graph within VS Code, which is convenient and well-integrated. However, it's limited to VS Code users and doesn't offer the birds-eye project health view with health scores, scope management, and enrichment pipeline monitoring that SourcePrep's standalone dashboard provides.

Control & Customization

Scope Management

Controlling what the AI can see

SourcePrep

Visual Folder-Tree with Include/Exclude

SourcePrep provides a visual folder-tree in the dashboard for precise scope control. Include or exclude entire directories, individual files, or use glob patterns. Changes take effect immediately and the dashboard shows exactly which files are in-scope, how many nodes are indexed, and what percentage of the codebase is covered. This gives developers fine-grained control over the AI's view of the project.

GitNexus

.gitignore-style Patterns

GitNexus uses .gitignore-style patterns for scope control. This is functional and familiar to developers, but there's no visual interface — you edit config files directly. You can't easily see at a glance which files are included or excluded, or what percentage of your codebase is covered.

Serena

LSP Workspace Scope

Serena scopes to whatever the LSP can see. There's no independent scope configuration. If the Language Server indexes it, Serena can query it; if not, it can't.

Understand-Anything

Auto Project Detection

The project-scanner agent auto-detects files, languages, and frameworks. Knowledge-base mode accepts an explicit path argument. There's no visual include/exclude tree exposed for fine-grained scope control within a repo — scope is implicit in what the scanner finds, with no way to focus the AI on a specific subsystem of a large monorepo.

bloop

Repo-Level Selection

bloop lets you choose which repositories to index. This is scope control at the repo level, which is useful for multi-repo setups. However, there's no file-level or folder-level control within a repo, and no visual tree for fine-tuning what's included.

Grepai

CLI Path Arguments

Grepai accepts path arguments on the command line. This is basic but functional for one-off searches. There's no persistent scope configuration or visual management.

Empirica

Git Repo Scope Only

Empirica scopes to the entire git repository. There's no fine-grained file or folder control. This makes sense for its epistemic-tracking purpose but doesn't allow developers to focus the AI on specific areas of a large monorepo.

Vexp

VS Code Workspace Scope

Vexp scopes to the VS Code workspace and supports include/exclude patterns in settings. This is adequate for single-workspace projects. However, there's no visual tree view for managing scope, and the settings are buried in VS Code's configuration UI rather than being front-and-center in a purpose-built dashboard.

Edge & Module Weighting

Prioritizing what matters most in the graph

SourcePrep

Configurable Edge Weights + Module Importance

SourcePrep assigns edge weights by kind (call, import, containment, inferred, LSP) that affect trace expansion priority. Module importance scores from the enrichment pipeline influence which context gets included first when token budgets are tight. The dashboard exposes these weights, letting developers fine-tune how the graph prioritizes different parts of the codebase — for example, boosting your core business logic over utility helpers.

GitNexus

Graph Centrality Metrics

GitNexus uses graph centrality metrics to rank nodes in its precomputed clusters. This implicitly weights important hub files higher. It's an automated, sensible approach. However, there are no user-facing controls to override the heuristics — you can't tell the system that your 'auth' module matters more than your 'utils' module.

Serena

No Ranking

Serena returns LSP results without ranking or weighting. All symbols are treated equally — the response to 'find references' includes every reference with no prioritization by importance.

Understand-Anything

Auto Layer Grouping

The architecture-analyzer agent automatically groups files into architectural layers (API / Service / Data / UI / Utility), and tour-builder orders learning walkthroughs by dependency. This is implicit weighting baked into the agent prompts. No user-facing controls to override the LLM's layer assignments or boost specific modules over others.

bloop

Vector Similarity Only

bloop ranks results by vector similarity. The search is effective but there's no graph-based weighting or user-configurable prioritization of modules or file groups.

Grepai

Embedding Similarity Only

Grepai ranks results by embedding similarity score only. There's no structural weighting, no graph-based prioritization, and no way to influence ranking beyond the query text.

Empirica

N/A

Empirica doesn't model code structure, so graph weighting isn't applicable to its approach. Its focus is on the agent's epistemic state, not code topology.

Vexp

Graph Centrality in Ranking

Vexp incorporates graph centrality into its search ranking. Similar concept to SourcePrep's edge weights but not user-configurable. The ranking is purely algorithmic with no developer input on priorities.

Privacy & Local-First

Where your code data lives

SourcePrep

100% Local: Rust + ONNX Zero Cloud

Everything in SourcePrep runs 100% locally. The Rust parser, ONNX embeddings (nomic-embed-text-v1.5), SQLite storage, and the dashboard all work fully offline. No code ever leaves your machine unless you explicitly configure team sync to your own S3 bucket. The ONNX runtime embeds at ~7ms per query with zero cloud dependencies, zero API keys, and zero data transmission.

GitNexus

Local (Node.js + WASM option)

GitNexus runs locally via Node.js CLI and offers an innovative browser-based WASM option that requires zero installation. Both modes are fully offline. Their WASM approach means you can even run it in a sandboxed browser tab. SourcePrep's native Rust engine is faster for large codebases, but GitNexus's zero-install browser option is a genuinely clever distribution strategy.

Serena

Local Server (LLM calls needed)

Serena runs locally as an MCP server, querying the local LSP. The tool itself is private. However, it's designed to be used with cloud-hosted LLMs, so code context inevitably flows to the model provider when the agent uses Serena's results.

Understand-Anything

Local Artifact, LLM Calls Required

The output graph stays on disk as a local JSON file, but the six-agent indexing pipeline requires LLM calls — code is sent to whichever provider you configure. There's no ONNX-equivalent local model path for the build itself. SourcePrep's full local-first story (Rust parser + ONNX embeddings + zero cloud calls by default) is structurally different.

bloop

Local (Qdrant Instance)

bloop runs locally with its own Qdrant vector database instance. Fully offline capable with a good privacy story. Comparable to SourcePrep's local-first approach.

Grepai

Privacy-First Local

Grepai is explicitly privacy-first with local embeddings. Strong privacy story, comparable to SourcePrep's approach. Both tools keep everything on-device with zero cloud dependencies for the core functionality.

Empirica

Git-Native (LLM calls needed)

Empirica stores everything in git notes — maximally local and version-controlled, which is excellent. However, the pre/postflight epistemic assessments require LLM calls, which means code context may be sent to cloud providers depending on configuration. The storage layer is private but the reasoning layer may not be.

Vexp

Local (VS Code Extension)

Vexp is fully local-first, running entirely within VS Code. No cloud calls. SQLite storage stays on disk. Strong privacy story — comparable to SourcePrep's approach. The main difference is SourcePrep works across any IDE via MCP and can run headlessly.

Companion, not a framework

SourcePrep isn't trying to replace your favorite tools. It's an MCP-native context engine designed to supercharge the AI IDEs and agents you already use.

Cursor

AI IDE

Connect SourcePrep via MCP to give Cursor perfect project-wide context without copying files.

Windsurf

AI IDE

Windsurf agents use SourcePrep to autonomously navigate the trace graph before editing.

Cline / Roo

VS Code Extension

Stop dropping raw files into context. Give your agent the SourcePrep LOD capsule instead.

Claude Code

CLI Agent

Supercharge Anthropic's CLI with blazing-fast local ONNX semantic routing.