> TheAuditor / blog
warden, mcp, integration

Pair Warden with TheAuditor

Provider freedom + serious token economics from Warden. Verified-fact code intelligence from TheAuditor. Why the pairing isn't a coincidence.

Most AI coding agents lock you into two things: a single LLM provider, and a single view of your codebase. Warden breaks the first lock. TheAuditor breaks the second. Pairing them is the rare case where two pre-launch tools were independently designed with each other in mind.

What Warden buys you

Warden is a lean, multi-provider, terminal-native LLM coding agent — single-binary CLI written in Python 3.14, MIT licensed, 42,800 LOC under a strict 13-tier import DAG enforced by import-linter on every PR.

It does what every other agent does — 21 builtin tools, full MCP client and server, session persistence, 22 hook event types, 6 permission modes, 4-tier memory walker — but with two engineering choices most proprietary agents skip:

1. Provider freedom

Three providers fully wired today — Anthropic, OpenAI (Platform API + Responses subscription tier with OAuth PKCE flow), Gemini. Three more scaffolded (Bedrock, Vertex, Foundry) with explicit NotImplementedError on construction so you can’t accidentally use a half-finished adapter. 24 model entries across the seven providers. The same CLI talks to whichever provider isn’t currently rate-limiting you, mid-task.

2. Serious token economics

This is where Warden’s engineering discipline shows.

Prompt caching on the wire end-to-end via cache_control markers — non-trivial plumbing per the changelog (entry W2.2: earlier composition-root drift was silently dropping the markers because ApiRequestParams.system was passing list[str] instead of typed list[TextBlock]; the Anthropic adapter’s _system_to_wire then emitted plain {type: text, text} dicts with no cache_control). Fixed end-to-end; OpenAI/Gemini drop the marker with a per-provider DEBUG log.

Auto-compaction fires at ~187K estimated tokens with four surgical strategies: system-reminder dedup, image age strip, tool-result age decay, mandatory orphan cleanup. Or invoke manually via /compact.

Hard cost governance — 12-model price table with Decimal accounting, configurable per-session USD cap, 80%/100% threshold warnings. Per-turn stats footer in the REPL renderer tells you exactly what each turn cost. The dedicated token_budget/ subsystem ships an estimator, persistor, analyzer, cache-break telemetry, and a budget tracker.

Most agents say “we use prompt caching.” Warden ships the audit trail.

What TheAuditor buys you

This site is TheAuditor’s. We do the opposite job: we stop the agent from guessing.

Every LLM coding agent suffers from the same failure mode — read 2,000 lines of code, infer a call graph from indentation and comments, hallucinate three of the relationships, write a “fix” that quietly breaks two other files. TheAuditor replaces that read-and-guess loop with a database query. Symbols, callers, callees, taint flows, framework boundaries, EIDL evasion signals — pre-computed, indexed, queryable in sub-milliseconds.

The honest pitch: tiny token reduction per call, huge hallucination reduction. Calling aud_explain on a file costs more tokens than naively reading the file once. But the elimination of re-reads, mis-edits, “let me just check this neighbouring file too” rabbit holes, and bullshit refactors that don’t actually match how the codebase calls into the symbol you’re touching dominates the math.

Warden’s own integration spec (architecture/24-theauditor-integration.md) puts the number at 85-95% token savings on common investigation flows.

Concrete numbers from our MCP-tool token-optimization audit (per-call wire bytes):

TargetJSON beforeJSON afterΔ
TS file (985 properties)28,62818,071-36.9%
Python file (469 symbols)17,0729,085-46.8%
Class symbol (15 callers, dups)8,3343,630-56.4%

That’s per-call after Markdown rendering. Compounded across a debugging session, the savings are what make the pairing actually land.

Coverage is honest too. We have 100% True Positive Rate at 0% False Positive Rate on OWASP Java (11/11), OWASP Python, and OWASP Juice Shop (31/31). We don’t ship risk scores or subjective ratings — we ship facts. Twelve languages with parity across indexing / taint / CFG / call graph / rules: Python, TypeScript / JavaScript, Java (three-lane CST + Javac), Go, Rust, PHP, Bash, Vue, Svelte / SvelteKit, GitHub Actions, Terraform / HCL, AWS CDK.

Why the pairing isn’t a coincidence

Warden’s architecture/24-theauditor-integration.md is an entire spec dedicated to integrating with TheAuditor. It describes the integration roadmap with phrases like “MCP-first architecture means TheAuditor’s MCP server is just another connected server with no special-casing required” and “Python-native means cross-language boundary friction disappears.”

That’s not retroactive marketing. Both projects were built knowing the other existed.

The integration surface is small because the architecture is right:

  • warden install --with-code-intel writes a .mcp.json snippet pointing at aud-mcp (TheAuditor’s stdio MCP server) plus a SessionStart hook that runs aud full --offline --fast in the background. The model gets fresh database state on every connect.
  • MCP prompts → Warden skills. TheAuditor’s /theauditor:planning, /theauditor:security, /theauditor:impact slash commands surface in Warden as /theauditor:* skills via Warden’s existing MCP-prompt-to-skill bridge. No code changes either side.
  • Context Gate (optional). Warden’s PreToolUse hook can hard-block Edit/Write until the model has called aud_explain on the target file. Enforcement is pure bash; the policy is yours.

Set it up in three commands

pip install warden theauditor                 # both are pip-installable, Python 3.14+

cd your-project
aud full --offline                            # index — 30s for small projects, 10 min for 100K+ LOC
warden install --with-code-intel              # writes .mcp.json + SessionStart hook

Open a Warden session. The model gets TheAuditor’s 8 MCP tools — aud_explain, aud_query, aud_findings, aud_impact, aud_blueprint, aud_session, aud_reindex, aud_analytics — on first invocation. Type a prompt. Watch the model call aud_explain instead of Read on the file it’s about to edit, and watch your token bill drop.

Honest disclaimers

Both projects are pre-launch. We’re not hiding it.

  • Warden is Pre-Alpha (v0.1.0). APIs and on-disk layouts may shift between waves. Three providers fully wired; three scaffolded. WebSearch is Anthropic-only today. The --permission-mode CLI flag is currently dropped on the floor at bootstrap.py:408-414 — set the mode via the runtime /plan command instead, or via permissions.default_mode in settings.json.
  • TheAuditor binary hasn’t shipped publicly yet. The Python source is being packaged via Nuitka with SQLCipher-encrypted analysis databases. Validation against OWASP Java/Python and Juice Shop benchmarks is complete (100% TPR / 0% FPR on all three). The public binary lands when adversarial-string-scan checks on the compiled artifact all pass.

What ships, ships. No vaporware promises.

Read the other side

The Warden team wrote the complementary post from their angle — “TheAuditor + Warden: stop guessing, start querying” — focused on the agent-side integration mechanics (hooks, skills, MCP transport, client init flags).

Subscribe via the signup form on the main site for launch notifications. We’ll only email when there’s something real to share.