Teaching an Audit Tool to Remember

Most “AI analysis” tools do a dumb thing, and once you see it you can’t unsee it: they have no memory. Run the analysis Monday, get a report. Run it again Tuesday on the same system, and the model re-discovers every single thing it found Monday, re-explains it in fresh words, and charges you full freight for the privilege. It’s Groundhog Day, except the bill is real.

The fix isn’t complicated, but the right version of it has one design decision people get wrong, so let me walk through it.

Findings that know their own name

I built a Postgres health tool where the model emits structured findings: a claim plus the data it cited. The first useful move was giving each finding a stable identity: a hash of the check it came from plus its normalized claim.

finding_id = sha1(f"{check_id}|{normalized_claim}").hexdigest()[:16]

Same problem on the same check, run after run, produces the same id. That’s the hook for everything else. If today’s finding has an id we saw last week, it’s the same finding, and we can carry its history forward instead of pretending we just met it.

The decision people get wrong

Naive memory says: when a problem gets fixed, reach back into last week’s report and flip that finding to “resolved.” Don’t do that. An audit is a photograph of a moment. Going back and editing the photograph to match today is just… lying about the past. And in a tool people use to track whether things are getting better or worse, a lying history is worse than no history.

So the rule is: history is immutable. Lineage is forward-recorded. A new finding carries two facts: first_seen_audit (how long has this been broken?) and supersedes (which older finding did this replace, if its value changed?). “Resolved” isn’t a status you write into the past; it’s something you compute when you compare two audits. The old reports stay exactly as they were taken.

for f in new_findings:
    prior = prior_by_id.get(f.finding_id)
    if prior:                                   # seen before
        f.first_seen_audit = prior["first_seen_audit"]   # carry the original date forward
    else:                                        # new or changed
        f.first_seen_audit = current_audit_id
        f.supersedes = [p for p in prior_on_same_check if p not in new]

I lifted the shape of this from an agent-memory project a colleague built — the good idea there is that memory isn’t an append-only log, it’s a set of facts that supersede each other over time. You don’t delete the old fact. You record that a newer one replaced it. Bitemporal databases have done this for decades; turns out it’s exactly right for AI findings too.

What you get out of it

Two things, and the second one is the one users actually care about.

First, the model stops wasting its time. Before generating findings, the tool feeds it last audit’s findings: “you found these already — tell me what still applies, don’t re-derive from scratch.” Less re-diagnosing, lower cost, more consistent output. (Small hygiene note that bit me: those prior claims get their newlines stripped before they go into the prompt, because a finding’s text is data, and data doesn’t get to smuggle in new instructions. Prompt injection isn’t always an attacker — sometimes it’s your own stored content with a stray newline.)

Second, and bigger: the report finally answers the only question a returning user has. Not “what’s my score” — they kind of know. The question is “what changed?”

Findings Changes:
  Resolved: 2   Changed: 1   New: 3   Persistent: 8

That’s the difference between a tool that re-states your situation every Tuesday and a tool that tells you the vacuum problem you fixed is gone, one config drifted, and three new things showed up. One of those is a report. The other is a babysitter that forgot you the moment it finished talking.

The sharp edge

One honest gotcha, because the code review caught it and it’s a good lesson. The first version let two new findings on the same check both claim to supersede the same old one. Two children, one parent — ambiguous lineage, and it makes the “what changed” diff lie. The fix was a one-line guard: only the first new finding on a check inherits the supersession; the rest are genuinely new.

Tiny bug. But the entire value of memory is an accurate “what changed,” and a sloppy lineage model quietly corrupts exactly that. The unsexy invariants are load-bearing.

The bigger pattern

If you’re bolting AI onto anything that runs more than once against the same target — monitoring, audits, code analysis, recurring reports — the no-memory default is costing you money and hiding the most useful thing you could show. Give your findings stable identities. Record lineage forward, never by rewriting history. Compute “resolved” at comparison time. Feed yesterday’s findings back in so the model stops reinventing them.

It’s maybe a hundred lines. The whole thing is open source, with a tutorial that walks the reconcile logic (and that supersession bug) step by step. Your AI should remember what it told you yesterday. Mine finally does.