> TheAuditor / blog
mcp, ai-agents, tokens, claude

Stop Reading Files: Why AI Coding Agents Should Query a Database Instead

AI agents waste tokens, hallucinate relationships, and miss cross-language flows because they read files instead of querying facts. We built the database that fixes it — and the MCP server that exposes it.

Watch any LLM coding agent work on a real codebase. It reads files. Then it reads more files. Then it re-reads files because it forgot what they said. Cross-file reasoning gets approximated by cramming file contents into the context window and hoping for the best.

Cross-language reasoning does not happen at all. The agent has no way to know that searchTerm in a .tsx component becomes sort_by in query_builder.py three services and two languages away.

The cost shows up everywhere: latency, hallucinated APIs, context exhaustion, an unbounded read loop on any non-trivial repo, and a “fix” that quietly breaks two unrelated files because the agent never saw the callers.

The fix: the database is the context

We ship an MCP server — aud-mcp — that exposes our normalized code graph as structured tools. The agent stops reading source. It asks the database. The database already knows the call graph, the import graph, the cross-language edges, the taint flows, the framework conventions, and the security findings, because aud full --offline computed them once and indexed them.

The nine MCP tools

ToolWhat the agent gets back
aud_blueprintProject-wide snapshot: structure, hot files, frameworks, API surface, import graph
aud_explainFull briefing for a file, symbol, or component: symbols, callers, callees, deps, findings
aud_queryPrecise lookups: callers of X, dependents of Y, pattern search, API coverage
aud_impactForward-reachability blast radius for a change, at a given depth
aud_searchCross-table exploratory search with scope, kind, and limit filters
aud_findingsFindings from every scanner, filterable by bucket, severity, or file
aud_sessionRecent activity: what changed, what ran, what failed
aud_analyticsProject-level analytics — churn, hot spots, error chains
aud_reindexRe-run the pipeline when source has drifted. The only non-readonly tool.

Before and after

The Architect asks the agent: “where does the user’s search term end up?”

Before MCP — the agent does the read-and-guess loop:

1.  Read SearchPage.tsx                      (file 1)
2.  Read api/client.ts                       (file 2)
3.  Read SearchController.java               (file 3)
4.  Read SearchService.java                  (file 4)
5.  Read AnalyticsClient.java                (file 5)
6.  Grep the repo for "search_term"          (tool call)
7.  Read routes/search.py                    (file 6)
8.  Read services/search_service.py          (file 7)
9.  Read core/query_builder.py               (file 8)
10. Reason about cross-language data flow    (and probably hallucinate)

Eight file reads, a grep, a round of speculation. The agent still may not catch that the sortBy field is reachable from the SQL ORDER BY clause — because nothing in the source told it the cross-language hop exists.

After MCP — same question, same agent:

1. aud_query symbol=searchTerm --show-data-deps --depth=5
   -> JSON: 8 hops, 3 languages,
      terminates at query_builder.py:68 cursor.execute
      (SQL Injection, VULNERABLE, full path returned)

One tool call. Structured answer. The full hop list with file and line at every step. The agent’s next action is to fix the bug — not to keep reading files trying to find it.

How big the savings actually get

Two ways to measure this, and we want to be honest about both.

Per call, directly measurable. Across the representative MCP payload types we instrument — file briefings, symbol lookups, project blueprints — we typically see payload reductions in the 35-55% range. Most of that comes from structured deduplication, schema-aware compaction, and the simple fact that the agent gets facts in the response, not a source-code dump it has to re-parse.

Representative targetJSON beforeJSON afterΔ
TS file with 985 properties28,62818,071-36.9%
Python file with 469 symbols17,0729,085-46.8%
Class symbol with 15 callers8,3343,630-56.4%

These are single-call measurements on representative artifacts — not end-to-end workflow benchmarks. Treat them as “what one query costs”, not “what your monthly bill drops by”.

Across a session, the wins compound. Eliminating re-reads, hitting the prompt cache on stable structural queries, narrower contexts that compact cleaner, fewer tool-call rounds before the agent reaches the answer — those effects stack on top of the per-call savings. Partner project Warden, in their TheAuditor integration spec, models the realistic aggregate at a theoretical 85-95% token reduction on common investigation flows.*

*Aggregate savings depend on workflow, agent configuration, prompt-caching support, codebase size, query patterns, and how disciplined the agent is about querying before reading. Treat the upper bound as a model output, not a guarantee — your mileage will vary, and the direction is what matters.

Honest TL;DR: the per-call savings are real and measurable. The session-level savings are where the math gets interesting. And the bugs the read-and-guess loop used to quietly miss — the cross-file caller the agent never grepped for, the cross-language flow it could never have inferred — are worth more than either number.

Set it up

pip install theauditor                # public binary, lands soon
cd your-project
aud full --offline                    # index the repo once
aud setup-mcp .                       # writes .mcp.json + Claude Code config + hooks

Open Claude Code in that project. The agent gets the nine tools above on first invocation, and is pre-instructed to query the database before reading source. Type a prompt. Watch the tool calls go to aud_explain and aud_query, not Read.

What ships with the MCP layer

The boring-but-load-bearing things you need before security and compliance teams will let an agent talk to your codebase:

  • Rate limiting per license tier — agents cannot blow your token budget with a runaway loop.
  • Result size capping — big questions get truncated answers instead of streaming a million rows into the agent’s context.
  • TTL result cache — re-asking the same question in the same session is free.
  • Per-call audit log — every tool call by every agent is recorded.

All of it is in the package, on by default.

Honest disclaimers

The binary is pre-launch. The MCP server, the nine tools, and the aud setup-mcp installer are all implemented and exercised against our internal test surfaces today. Public release lands when the compiled artifact clears the same OWASP corpora the source already does. No date promise. When it ships, it ships.

Read more

Pair us with Warden — the multi-provider, terminal-native coding agent also built around MCP — and you get the agent-side savings on top of these. The companion post walks through the integration mechanics.

Subscribe via the signup form on the main site for launch notifications.