Drowning in 20,000 Findings: Designing for Signal, Not Noise

Run a real assessment against a real legacy codebase and you don’t get 50 findings. You get thousands. Twenty thousand is not unusual for a big, old, multi-language application. And here’s the trap that catches a lot of analysis tools: a tool that finds 20,000 things and shows you 20,000 things has not helped you. It’s transferred the problem from “I don’t know what’s wrong” to “I’m staring at an infinite list and I have no idea where to start,” which feels like progress and isn’t.

The hard engineering problem in an assessment tool isn’t finding the issues. Honestly, finding them is the easy part once you’ve got the rule engine and the LLM tiers. The hard part is presenting twenty thousand of them so a human can actually act. That’s a design problem, and getting it wrong wastes the entire assessment.

Why the raw count is so big (and so misleading)

Two things inflate the number. First, the same underlying problem shows up many times — one Oracle pattern used in 300 places is 300 findings, even though it’s one decision to make and then apply. Second, the multi-tier funnel means the same site can get flagged by a rule, by the extraction pass, and by the LLM, each describing it a little differently. Naively, that’s three findings for one issue, times thousands of issues. The count balloons, and most of the balloon is redundancy, not distinct work.

So the first job isn’t showing the findings. It’s making the count honest.

Consolidation: collapse the redundancy

Swordfish does a few things to turn a noisy pile into a real worklist.

It deduplicates and corroborates across tiers. When the rule engine, the extraction pass, and the LLM all flag the same site, that’s not three problems — it’s one problem with three independent confirmations. Collapsing them removes the redundancy and raises your confidence in what’s left, which is a nice two-for-one.

It detects phrasing drift. The LLM tiers will describe the same underlying concern in slightly different words across hundreds of sites — “NULL handling difference,” “COALESCE semantics,” “empty-string mismatch.” Swordfish groups these under a single concern_key, so you see one concern with its 300 occurrences attached, not 300 findings that look distinct but aren’t. You make the decision once; the grouping shows you everywhere it applies.

The result is that the number you actually engage with is the number of decisions you have to make, which is dramatically smaller than the number of places those decisions apply. That distinction is the whole ballgame.

Then: rank, group, and let people drill down

Even consolidated, it’s a lot. So the presentation has to do triage for you:

Severity and effort on every finding, so you can sort by “what’ll hurt most” and “what’s cheapest to fix” and attack the high-impact-low-effort quadrant first. A behavioral trap that corrupts data and a cosmetic syntax nit should never sit in the same undifferentiated list.

A grouped view, not a flat one. You work at the level of “here’s the empty-string concern, it hits 300 sites, here’s the recommendation” — and then drill into the specific occurrences only when you need to. The default view is decisions; the detail is one click away, not in your face.

Counts that stay trustworthy at scale. The summary numbers (by severity, by category, by status) are computed to match the filtered list exactly, and the detail lists are paginated rather than dumping 20,000 rows into one response and melting your browser. Sounds obvious; plenty of tools get it wrong and either lie in the summary or hang on the list.

The principle: the tool’s job is to direct attention

Here’s the lesson I’d pull out of this for anyone building analysis tooling of any kind: finding problems is table stakes; ranking and grouping them is the actual product. Human attention is the scarce resource in a migration, not compute. A tool that respects that — that collapses redundancy, surfaces the decisions, ranks by impact, and gets out of the way until you ask for detail — turns 20,000 findings into a morning’s worth of prioritized decisions. A tool that doesn’t turns the same 20,000 into a reason to give up and do it by hand.

Twenty thousand findings is what honesty looks like; a legacy codebase really does have that much that’s different. The measure of the tool isn’t whether it finds them. It’s whether, after it finds them, you know exactly what to do first. If you’re evaluating one, paste in something big and ugly and watch what it does with the list. If it shows you all twenty thousand, it found your problems and made them yours.

Next: a war story about a bug we shipped in our own findings pipeline, and what it taught us about data integrity.

Swordfish is an open-source (Apache-2.0) assessment harness for migrating Oracle, MySQL, SQL Server, Sybase, and DB2 to PostgreSQL — it shows you what’s in your codebase, what needs to change, and hands scoped tasks to the copilot you already use. Source: github.com/EnterpriseDB/swordfish-migrations