Goldfish

deep-dive

Stop Re-Asking the LLM the Same Question

Incremental distillation: hash the inputs, skip unchanged checks, and cut ~64% of LLM calls on an unchanged re-run — with an honest note on what is not gated yet.

Matt Yonkovit · 4 min read

So I had a tool that runs an AI analysis on a Postgres database. Worked great. Then I looked at what it cost to run twice in a row against a system where nothing had changed, and the number annoyed me.

Same data in. Same analysis out. Full token bill, both times. The model dutifully re-read 100 checks’ worth of identical metrics and re-generated 100 checks’ worth of basically-identical findings. I was paying a brilliant, expensive intern to re-read the same report every morning and tell me the same things. That’s not analysis. That’s a subscription to déjà vu.

The obvious idea nobody implements

If a check’s data is byte-for-byte identical to last run, there is nothing new to analyze. So don’t. Reuse last time’s answer.

The trick is knowing “identical.” You hash it:

def changed_check_ids(current_details, prior_details) -> set[str]:
    changed = set()
    for cid, details in current_details.items():
        if cid not in prior_details or _hash(details) != _hash(prior_details[cid]):
            changed.add(cid)
    return changed

Hash every check’s collected data (sorted keys, so a reordered dict doesn’t read as a change). Anything new or different lands in the set. Everything else is, provably, the same as last time.

Then the analyzer skips any category whose checks are all unchanged and reuses the prior findings for it. And it’s conservative on purpose: one changed check in a category sends the whole category back to the model. I would much rather re-analyze a few unchanged checks than skip a real change. Skipping a real finding to save a fraction of a cent is exactly the kind of “optimization” that turns into an incident.

It compounds

The skip shows up twice, which is the fun part. The analyzer skips generating findings for unchanged categories. And the judge (the verification layer that votes on findings) skips re-voting on any finding it already has a verdict for:

for f in findings:
    if f.verdict and f.verdict != "unverified":
        continue   # carried forward from last run, already judged — don't re-spend tokens
    ...

Generation skip plus verification skip. Run an audit twice against an unchanged database and the measured result was 31 LLM calls down to 11 — about 64% fewer. The test doesn’t pin those exact counts; it enforces the property that matters:

assert second_run_calls < first_run_calls   # 2nd run = identical telemetry → strictly fewer calls

I like making cost a property the tests enforce. “Be efficient with tokens” is advice nobody follows. “This test fails if the unchanged re-run costs as much as the first run” is a guardrail that follows itself.

The honest part

Here’s where I tell you the thing a launch post would leave out. That 64% isn’t 100%, and the gap has a reason: there’s a separate per-category prose layer that still runs ungated on every audit. The findings and the judge are gated; the prose layer is the next slice. So the real claim is “cut the analysis and verification cost on unchanged checks,” not “cut your whole LLM bill.”

Why am I telling you the unflattering version? Because I’ve read enough launch posts that round every partial win up to a total victory, and it erodes trust the second a reader actually runs the thing and sees the gap. An honest “we got two of three layers, the third’s still open” is worth more than a confident overstatement that falls apart on contact. (It’s also written down in the build log, so future me doesn’t get to quietly forget there’s a layer left.)

What that actually saves you

It means the boring scheduled re-audit, the one that runs every night and usually finds nothing new, costs a third of what it did, and the spend concentrates on the checks that actually moved. The expensive machinery wakes up for change and dozes through stability. Which is how it should’ve worked from the start.

The broader point, past this one tool: if you’re running an LLM against recurring data, you are almost certainly paying full price to re-derive answers you already have. Content-hash the inputs. Skip the unchanged. Reuse prior outputs. And — this is the part that makes it real — write the test that fails if you don’t.

Open source, with a tutorial that walks the hashing and the skip logic. Go look at what your own AI pipeline charges you to tell you nothing new.