> TheAuditor / blog
dogfooding, sast, operations, license-server

Dogfooding TheAuditor on Our Own License Server

Days from public launch, we pointed our own SAST at the license server we ship on. 204 findings, 0 launch-blockers, and the false-positive rate became the next product release.

We’re a week from flipping api.theauditortool.com to public. The license server behind that hostname is the same one every paying user will eventually trust to issue, validate, and revoke their copy of our tool. So before we let anyone else point a scanner at it, we ran our own scanner over it.

One command. aud full --offline against www/. 204 findings.

Here is what fell out — both the bugs we caught and the part most vendors hide: the false positives, the rule upgrades they forced, and the operational gear that ended up next to the scanner output.

The triage breakdown

We don’t ship risk scores. We ship facts. So the only honest way to talk about a 204-finding report is to walk the triage in numbers:

BucketCountWhat it means
Launch-blocking vulnerabilities0No critical chains reached an exploitable sink. Verified by hand.
Confirmed false positives~185Pattern-correct, context-wrong. The interesting bucket.
LOW/INFO hardening items worth fixing6Real, non-launch-blocking, fixed in two shipped sprints + one queued.
Moderate transitive CVEs5Bumped via package.json / pnpm.overrides. Zero direct-dep impact.
Other noise (annotated in-source)smallPath Traversal on dev scripts, etc. Marked as known-FP in the source tree.

The headline number is 0. The interesting number is 185.

The six hardening items (categories, not exploit recipes)

We deliberately keep this list at the category level. The point of publishing dogfooding results is to show the process, not to hand a roadmap to anyone scanning our origin tomorrow.

Two sprints already shipped:

  • Credentials in structured logs. Multiple code paths emitted a credential-shaped value into the structured logger on the happy path. Replaced every credential field in every log payload with an internal UUID that identifies the same record for operations but has zero authentication value to whoever reads the log. Hardware fingerprints dropped entirely.
  • Identity re-registration edge case. A telemetry endpoint had an idempotent-by-design path that, in one branch, returned a signing secret in the response if the same identifier was submitted twice. Changed to issue the secret once per identifier and forever after return {already_registered: true} with no secret. Client side learned to rotate the identifier when it detects this state.
  • HTTP error mapping. Malformed JSON bodies were returning 500 INTERNAL_ERROR. Now they return 400 INVALID_JSON. The 5xx counter is reserved for actual server failures.
  • Security-header dedup. nginx and the app framework were both setting X-Frame-Options, HSTS, and friends — different layers, same headers, undefined-behavior collisions. Pick one. We picked the app layer (it travels with the binary across proxy changes).
  • Unbounded retention query. A metrics endpoint was materializing a 90-day cohort install-id list into Node memory per request, then sending it back to Postgres as a parameter list. Rewrote as a single server-side CTE: one row out, no cohort materialization.
  • Promise.all -> Promise.allSettled. A dashboard endpoint ran 11 sub-queries in parallel and rejected the whole response if any one failed. Now individual failures degrade their tile instead of taking the whole dashboard dark.
  • Dev DB port leak. A dev compose file bound Postgres to 0.0.0.0:5433 with POSTGRES_PASSWORD: dev. Anyone on the same WiFi could connect. Now bound to 127.0.0.1 with a top-of-file comment so the next person doesn’t compose up on a prod host.
  • Prod compose resource limits. Added explicit mem_limit and cpus to both api and db services. A runaway query no longer starves the VPS.

One sprint queued (in-flight as we publish this):

  • Timestamp window on signed telemetry events (closes replay).
  • Structured-logger redact paths for auth and signature headers.
  • Per-request timeout + graceful SIGTERM/SIGINT drain with pg pool flush + keep-alive socket draining (nginx upstream uses keepalive 32, plain server.close() won’t drain it).
  • Docker prod api filesystem flipped to read_only: true with tmpfs mounts for /tmp and /var/run. Zero application disk writes at runtime, so the surface is free for the taking.
  • Tightening a generated-credential regex to match the generator’s actual entropy alphabet, with a load-bearing comment on the generator so a future “simplification” can’t desync them.

None of that is exotic. All of it is the kind of work that gets written down because the scanner pointed at it and a human confirmed the call.

The false positives are the actual product

185 false positives is not the embarrassing number. It is the input signal. Every category of FP we hit on www/ got converted into a rule improvement in theauditor/rules/:

FP categoryWhy it firedNew gate
cwe306-missing-authentication on POST routesAuth was credential-in-body, not middleware-shapedCREDENTIAL_IN_BODY (looks for *.safeParse in handler)
cwe352-missing-csrf on JSON APIs consumed by CLIsProject never imports a cookie/session libNO_COOKIE_AUTH (project-level gate)
missing-validation on controllers that use Zod inlineValidators called in the handler body, not as middlewareExtra JOIN on *.safeParse / *.parse callsites
react-exposed-api-key on Ed25519 publicKeyPattern matched substring “KEY” with no crypto contextAsymmetric-crypto awareness — public halves are public
shared-state-unsafe on CLI process.exitCode = NLIKE %process% substring with no server-context checkNO_CONCURRENCY_SURFACE for CLI scripts
ghost-dependency on astro:content virtual imports<scheme>: prefix wasn’t recognized as a virtual-module conventionVirtual-module recognizer (Vite, Astro, Rollup conventions)

Each one is a rule looking at architectural context — the database of facts we built during aud full — instead of pattern-matching text. Per the rules SOP: every fix wires to Tier 1-4 semantic data (resolved flow audit, junction tables, structured AST columns, graph edges). Zero new regex on raw text.

After the rule push: the six categories above went from a combined ~30 high-confidence findings on www/ to zero on the re-scan, without changing a line of www/ source. The bugs were in our rules.

The arg-role gap we know we still have

One coverage gap is still open and is filed for the next round of extractor work, not blocked-on-launch:

When a config value flows from process.env.DB_HOST into a new pg.Pool({host: ...}) constructor, our taint discoverer follows the resulting pool reference forward to every pool.query("...hardcoded literal...") call in the file and flags critical SQL injection. The query body never sees the env var — but our *_function_call_args table doesn’t yet have an arg_role column distinguishing “argument went into a driver config object” from “argument went into a SQL string body.” Both shapes look identical in the schema today.

Where this fires on our own admin scripts, we annotated each site in-source with a // theauditor: ... comment explaining the finding is a known FP pending the schema upgrade. The CLAUDE.md REVIEW ITEMS section tracks the same gap under arg-role coverage gap (2026-05-25) so it doesn’t drop off the radar.

Honest reporting beats silent suppression. The annotation is in the tree, the rule still fires, the next extractor release closes the gap properly.

Operations gear that came with the scanner output

Dogfooding is not just running the binary. It is what you do with the findings, and what you build next to the binary so the next run is cheaper:

  • scripts/vps_self_audit.sh — orchestrates the full laptop-to-prod validation loop in one bash invocation. Picks the latest fresh build from build/, SCPs to the VPS, verifies the checksum, activates a license against the live API, runs aud full on a fixture, fetches the encrypted .pf/ databases back, decrypts locally, and prints a one-line PASS/FAIL summary. Re-runnable without thinking.
  • Cloudflare DDoS playbook. Cold-state runbook in www/publish.md for the day someone L7-floods the API. Lays out pre-conditions to set up before you need them (Cloudflare account, zones added but proxy OFF, DNS records pre-created pointing at the VPS, IP-allowlist rule pre-written) so the switchover during an incident is two clicks instead of a fifteen- minute DNS-record exercise under pressure. Also explicitly documents what Cloudflare doesn’t help with (origin compromise, DNS-layer attacks at the registrar).
  • UptimeRobot, free tier. External monitor on the API health endpoint, 5-min interval. Without it you only learn the API is down when a customer complains; the license model’s offline-grace window makes a short outage non-catastrophic, but a 5-min signal beats a 5-day signal.

The scanner produces a database of findings. The runbook produces the day-of operational story. Both ship in the same commit window for a reason.

Why we’re telling you this

Plenty of security vendors run their own tool on their own code and never publish the result. The few that do typically report a single sanitized headline number — “we scanned ourselves, we found N things, we fixed them all” — and leave the methodology opaque.

The methodology is the product.

The fact that our own dogfooding run started at 204 findings and ended with six real items, ~185 false positives, and a rule shipment that took the FP categories to zero on re-scan is exactly the loop we want to sell. A tool that produces noise without a path to reduce that noise is not a tool you can leave running in CI. A tool that turns false positives into rule improvements is.

We did not pass our own audit by accident. We passed it by being honest about what fired, why each finding fired, and which findings were wrong about us — and then by fixing the rules so the next codebase like ours doesn’t get the same noise.

Honest disclaimers

The numbers above are from one project (www/, our license server + marketing workspace). Different codebases hit different rule subsets. 204 is a snapshot, not a baseline.

The “~185 false positives” is a human-confirmed count, not a tool-emitted one. After the rule push we re-ran and watched the six FP categories go to zero on www/ — the binary’s own arithmetic, not a vibes-based “looks better now.”

The binary is still pre-launch. We ship it when the compiled artifact clears the same OWASP corpora the source already does, not before. The license server it talks to is what we just scanned. The dogfooding loop is the prerequisite, not the finish line.

Subscribe

Subscribe via the signup form on the main site for launch notifications. We only email when there is something real to share.