Dogfooding TheAuditor on Our Own License Server
Days from public launch, we pointed our own SAST at the license server we ship on. 204 findings, 0 launch-blockers, and the false-positive rate became the next product release.
We’re a week from flipping api.theauditortool.com to public. The
license server behind that hostname is the same one every paying user
will eventually trust to issue, validate, and revoke their copy of
our tool. So before we let anyone else point a scanner at it, we
ran our own scanner over it.
One command. aud full --offline against www/. 204 findings.
Here is what fell out — both the bugs we caught and the part most vendors hide: the false positives, the rule upgrades they forced, and the operational gear that ended up next to the scanner output.
The triage breakdown
We don’t ship risk scores. We ship facts. So the only honest way to talk about a 204-finding report is to walk the triage in numbers:
| Bucket | Count | What it means |
|---|---|---|
| Launch-blocking vulnerabilities | 0 | No critical chains reached an exploitable sink. Verified by hand. |
| Confirmed false positives | ~185 | Pattern-correct, context-wrong. The interesting bucket. |
| LOW/INFO hardening items worth fixing | 6 | Real, non-launch-blocking, fixed in two shipped sprints + one queued. |
| Moderate transitive CVEs | 5 | Bumped via package.json / pnpm.overrides. Zero direct-dep impact. |
| Other noise (annotated in-source) | small | Path Traversal on dev scripts, etc. Marked as known-FP in the source tree. |
The headline number is 0. The interesting number is 185.
The six hardening items (categories, not exploit recipes)
We deliberately keep this list at the category level. The point of publishing dogfooding results is to show the process, not to hand a roadmap to anyone scanning our origin tomorrow.
Two sprints already shipped:
- Credentials in structured logs. Multiple code paths emitted a credential-shaped value into the structured logger on the happy path. Replaced every credential field in every log payload with an internal UUID that identifies the same record for operations but has zero authentication value to whoever reads the log. Hardware fingerprints dropped entirely.
- Identity re-registration edge case. A telemetry endpoint had
an idempotent-by-design path that, in one branch, returned a
signing secret in the response if the same identifier was
submitted twice. Changed to issue the secret once per identifier
and forever after return
{already_registered: true}with no secret. Client side learned to rotate the identifier when it detects this state. - HTTP error mapping. Malformed JSON bodies were returning 500 INTERNAL_ERROR. Now they return 400 INVALID_JSON. The 5xx counter is reserved for actual server failures.
- Security-header dedup. nginx and the app framework were both
setting
X-Frame-Options,HSTS, and friends — different layers, same headers, undefined-behavior collisions. Pick one. We picked the app layer (it travels with the binary across proxy changes). - Unbounded retention query. A metrics endpoint was materializing a 90-day cohort install-id list into Node memory per request, then sending it back to Postgres as a parameter list. Rewrote as a single server-side CTE: one row out, no cohort materialization.
Promise.all->Promise.allSettled. A dashboard endpoint ran 11 sub-queries in parallel and rejected the whole response if any one failed. Now individual failures degrade their tile instead of taking the whole dashboard dark.- Dev DB port leak. A dev compose file bound Postgres to
0.0.0.0:5433withPOSTGRES_PASSWORD: dev. Anyone on the same WiFi could connect. Now bound to127.0.0.1with a top-of-file comment so the next person doesn’tcompose upon a prod host. - Prod compose resource limits. Added explicit
mem_limitandcpusto both api and db services. A runaway query no longer starves the VPS.
One sprint queued (in-flight as we publish this):
- Timestamp window on signed telemetry events (closes replay).
- Structured-logger redact paths for auth and signature headers.
- Per-request timeout + graceful SIGTERM/SIGINT drain with pg pool
flush + keep-alive socket draining (nginx upstream uses keepalive
32, plain
server.close()won’t drain it). - Docker prod api filesystem flipped to
read_only: truewith tmpfs mounts for/tmpand/var/run. Zero application disk writes at runtime, so the surface is free for the taking. - Tightening a generated-credential regex to match the generator’s actual entropy alphabet, with a load-bearing comment on the generator so a future “simplification” can’t desync them.
None of that is exotic. All of it is the kind of work that gets written down because the scanner pointed at it and a human confirmed the call.
The false positives are the actual product
185 false positives is not the embarrassing number. It is the
input signal. Every category of FP we hit on www/ got converted
into a rule improvement in theauditor/rules/:
| FP category | Why it fired | New gate |
|---|---|---|
cwe306-missing-authentication on POST routes | Auth was credential-in-body, not middleware-shaped | CREDENTIAL_IN_BODY (looks for *.safeParse in handler) |
cwe352-missing-csrf on JSON APIs consumed by CLIs | Project never imports a cookie/session lib | NO_COOKIE_AUTH (project-level gate) |
missing-validation on controllers that use Zod inline | Validators called in the handler body, not as middleware | Extra JOIN on *.safeParse / *.parse callsites |
react-exposed-api-key on Ed25519 publicKey | Pattern matched substring “KEY” with no crypto context | Asymmetric-crypto awareness — public halves are public |
shared-state-unsafe on CLI process.exitCode = N | LIKE %process% substring with no server-context check | NO_CONCURRENCY_SURFACE for CLI scripts |
ghost-dependency on astro:content virtual imports | <scheme>: prefix wasn’t recognized as a virtual-module convention | Virtual-module recognizer (Vite, Astro, Rollup conventions) |
Each one is a rule looking at architectural context — the database
of facts we built during aud full — instead of pattern-matching
text. Per the rules SOP: every fix wires to Tier 1-4 semantic
data (resolved flow audit, junction tables, structured AST columns,
graph edges). Zero new regex on raw text.
After the rule push: the six categories above went from a combined
~30 high-confidence findings on www/ to zero on the
re-scan, without changing a line of www/ source. The bugs were
in our rules.
The arg-role gap we know we still have
One coverage gap is still open and is filed for the next round of extractor work, not blocked-on-launch:
When a config value flows from process.env.DB_HOST into a
new pg.Pool({host: ...}) constructor, our taint discoverer
follows the resulting pool reference forward to every
pool.query("...hardcoded literal...") call in the file and
flags critical SQL injection. The query body never sees the env
var — but our *_function_call_args table doesn’t yet have an
arg_role column distinguishing “argument went into a driver
config object” from “argument went into a SQL string body.” Both
shapes look identical in the schema today.
Where this fires on our own admin scripts, we annotated each site
in-source with a // theauditor: ... comment explaining the
finding is a known FP pending the schema upgrade. The
CLAUDE.md REVIEW ITEMS section tracks the same gap under
arg-role coverage gap (2026-05-25) so it doesn’t drop off the
radar.
Honest reporting beats silent suppression. The annotation is in the tree, the rule still fires, the next extractor release closes the gap properly.
Operations gear that came with the scanner output
Dogfooding is not just running the binary. It is what you do with the findings, and what you build next to the binary so the next run is cheaper:
scripts/vps_self_audit.sh— orchestrates the full laptop-to-prod validation loop in one bash invocation. Picks the latest fresh build frombuild/, SCPs to the VPS, verifies the checksum, activates a license against the live API, runsaud fullon a fixture, fetches the encrypted.pf/databases back, decrypts locally, and prints a one-line PASS/FAIL summary. Re-runnable without thinking.- Cloudflare DDoS playbook. Cold-state runbook in
www/publish.mdfor the day someone L7-floods the API. Lays out pre-conditions to set up before you need them (Cloudflare account, zones added but proxy OFF, DNS records pre-created pointing at the VPS, IP-allowlist rule pre-written) so the switchover during an incident is two clicks instead of a fifteen- minute DNS-record exercise under pressure. Also explicitly documents what Cloudflare doesn’t help with (origin compromise, DNS-layer attacks at the registrar). - UptimeRobot, free tier. External monitor on the API health endpoint, 5-min interval. Without it you only learn the API is down when a customer complains; the license model’s offline-grace window makes a short outage non-catastrophic, but a 5-min signal beats a 5-day signal.
The scanner produces a database of findings. The runbook produces the day-of operational story. Both ship in the same commit window for a reason.
Why we’re telling you this
Plenty of security vendors run their own tool on their own code and never publish the result. The few that do typically report a single sanitized headline number — “we scanned ourselves, we found N things, we fixed them all” — and leave the methodology opaque.
The methodology is the product.
The fact that our own dogfooding run started at 204 findings and ended with six real items, ~185 false positives, and a rule shipment that took the FP categories to zero on re-scan is exactly the loop we want to sell. A tool that produces noise without a path to reduce that noise is not a tool you can leave running in CI. A tool that turns false positives into rule improvements is.
We did not pass our own audit by accident. We passed it by being honest about what fired, why each finding fired, and which findings were wrong about us — and then by fixing the rules so the next codebase like ours doesn’t get the same noise.
Honest disclaimers
The numbers above are from one project (www/, our license
server + marketing workspace). Different codebases hit different
rule subsets. 204 is a snapshot, not a baseline.
The “~185 false positives” is a human-confirmed count, not a
tool-emitted one. After the rule push we re-ran and watched the
six FP categories go to zero on www/ — the binary’s own
arithmetic, not a vibes-based “looks better now.”
The binary is still pre-launch. We ship it when the compiled artifact clears the same OWASP corpora the source already does, not before. The license server it talks to is what we just scanned. The dogfooding loop is the prerequisite, not the finish line.
Subscribe
Subscribe via the signup form on the main site for launch notifications. We only email when there is something real to share.