Pair Warden with TheAuditor
Provider freedom + serious token economics from Warden. Verified-fact code intelligence from TheAuditor. Why the pairing isn't a coincidence.
Most AI coding agents lock you into two things: a single LLM provider, and a single view of your codebase. Warden breaks the first lock. TheAuditor breaks the second. Pairing them is the rare case where two pre-launch tools were independently designed with each other in mind.
What Warden buys you
Warden is a lean, multi-provider, terminal-native LLM
coding agent — single-binary CLI written in Python 3.14, MIT licensed,
42,800 LOC under a strict 13-tier import DAG enforced by import-linter on every PR.
It does what every other agent does — 21 builtin tools, full MCP client and server, session persistence, 22 hook event types, 6 permission modes, 4-tier memory walker — but with two engineering choices most proprietary agents skip:
1. Provider freedom
Three providers fully wired today — Anthropic, OpenAI (Platform API +
Responses subscription tier with OAuth PKCE flow), Gemini. Three more scaffolded
(Bedrock, Vertex, Foundry) with explicit NotImplementedError on construction so
you can’t accidentally use a half-finished adapter. 24 model entries across the
seven providers. The same CLI talks to whichever provider isn’t currently
rate-limiting you, mid-task.
2. Serious token economics
This is where Warden’s engineering discipline shows.
Prompt caching on the wire end-to-end via cache_control markers — non-trivial
plumbing per the changelog (entry W2.2: earlier composition-root drift was silently
dropping the markers because ApiRequestParams.system was passing list[str]
instead of typed list[TextBlock]; the Anthropic adapter’s _system_to_wire then
emitted plain {type: text, text} dicts with no cache_control). Fixed end-to-end;
OpenAI/Gemini drop the marker with a per-provider DEBUG log.
Auto-compaction fires at ~187K estimated tokens with four surgical strategies:
system-reminder dedup, image age strip, tool-result age decay, mandatory orphan
cleanup. Or invoke manually via /compact.
Hard cost governance — 12-model price table with Decimal accounting,
configurable per-session USD cap, 80%/100% threshold warnings. Per-turn stats
footer in the REPL renderer tells you exactly what each turn cost. The dedicated
token_budget/ subsystem ships an estimator, persistor, analyzer, cache-break
telemetry, and a budget tracker.
Most agents say “we use prompt caching.” Warden ships the audit trail.
What TheAuditor buys you
This site is TheAuditor’s. We do the opposite job: we stop the agent from guessing.
Every LLM coding agent suffers from the same failure mode — read 2,000 lines of code, infer a call graph from indentation and comments, hallucinate three of the relationships, write a “fix” that quietly breaks two other files. TheAuditor replaces that read-and-guess loop with a database query. Symbols, callers, callees, taint flows, framework boundaries, EIDL evasion signals — pre-computed, indexed, queryable in sub-milliseconds.
The honest pitch: tiny token reduction per call, huge hallucination reduction.
Calling aud_explain on a file costs more tokens than naively reading the file
once. But the elimination of re-reads, mis-edits, “let me just check this
neighbouring file too” rabbit holes, and bullshit refactors that don’t actually
match how the codebase calls into the symbol you’re touching dominates the math.
Warden’s own integration spec (architecture/24-theauditor-integration.md) puts
the number at 85-95% token savings on common investigation flows.
Concrete numbers from our MCP-tool token-optimization audit (per-call wire bytes):
| Target | JSON before | JSON after | Δ |
|---|---|---|---|
| TS file (985 properties) | 28,628 | 18,071 | -36.9% |
| Python file (469 symbols) | 17,072 | 9,085 | -46.8% |
| Class symbol (15 callers, dups) | 8,334 | 3,630 | -56.4% |
That’s per-call after Markdown rendering. Compounded across a debugging session, the savings are what make the pairing actually land.
Coverage is honest too. We have 100% True Positive Rate at 0% False Positive Rate on OWASP Java (11/11), OWASP Python, and OWASP Juice Shop (31/31). We don’t ship risk scores or subjective ratings — we ship facts. Twelve languages with parity across indexing / taint / CFG / call graph / rules: Python, TypeScript / JavaScript, Java (three-lane CST + Javac), Go, Rust, PHP, Bash, Vue, Svelte / SvelteKit, GitHub Actions, Terraform / HCL, AWS CDK.
Why the pairing isn’t a coincidence
Warden’s architecture/24-theauditor-integration.md is an entire spec dedicated
to integrating with TheAuditor. It describes the integration roadmap with phrases
like “MCP-first architecture means TheAuditor’s MCP server is just another
connected server with no special-casing required” and “Python-native means
cross-language boundary friction disappears.”
That’s not retroactive marketing. Both projects were built knowing the other existed.
The integration surface is small because the architecture is right:
warden install --with-code-intelwrites a.mcp.jsonsnippet pointing ataud-mcp(TheAuditor’s stdio MCP server) plus aSessionStarthook that runsaud full --offline --fastin the background. The model gets fresh database state on every connect.- MCP prompts → Warden skills. TheAuditor’s
/theauditor:planning,/theauditor:security,/theauditor:impactslash commands surface in Warden as/theauditor:*skills via Warden’s existing MCP-prompt-to-skill bridge. No code changes either side. - Context Gate (optional). Warden’s
PreToolUsehook can hard-blockEdit/Writeuntil the model has calledaud_explainon the target file. Enforcement is pure bash; the policy is yours.
Set it up in three commands
pip install warden theauditor # both are pip-installable, Python 3.14+
cd your-project
aud full --offline # index — 30s for small projects, 10 min for 100K+ LOC
warden install --with-code-intel # writes .mcp.json + SessionStart hook
Open a Warden session. The model gets TheAuditor’s 8 MCP tools — aud_explain,
aud_query, aud_findings, aud_impact, aud_blueprint, aud_session,
aud_reindex, aud_analytics — on first invocation. Type a prompt. Watch the
model call aud_explain instead of Read on the file it’s about to edit, and
watch your token bill drop.
Honest disclaimers
Both projects are pre-launch. We’re not hiding it.
- Warden is Pre-Alpha (v0.1.0). APIs and on-disk layouts may shift between
waves. Three providers fully wired; three scaffolded. WebSearch is Anthropic-only
today. The
--permission-modeCLI flag is currently dropped on the floor atbootstrap.py:408-414— set the mode via the runtime/plancommand instead, or viapermissions.default_modeinsettings.json. - TheAuditor binary hasn’t shipped publicly yet. The Python source is being packaged via Nuitka with SQLCipher-encrypted analysis databases. Validation against OWASP Java/Python and Juice Shop benchmarks is complete (100% TPR / 0% FPR on all three). The public binary lands when adversarial-string-scan checks on the compiled artifact all pass.
What ships, ships. No vaporware promises.
Read the other side
The Warden team wrote the complementary post from their angle —
“TheAuditor + Warden: stop guessing, start querying”
— focused on the agent-side integration mechanics (hooks, skills, MCP transport,
client init flags).
Subscribe via the signup form on the main site for launch notifications. We’ll only email when there’s something real to share.