Goldfish

opinion

Right-Sized AI: Why You Don't Need GPT-5 to Migrate a Stored Procedure

The reflex to reach for the biggest model is expensive, often unnecessary, and sometimes flatly disqualifying.

Matt Yonkovit · 4 min read

I’ve been beating this drum for a while now, in AI infrastructure generally, and it applies cleanly to migrations: the instinct to throw the largest frontier model at every problem is usually the wrong instinct. Not because frontier models aren’t good — they’re great — but because “good enough at the right cost, in the right place, under the right constraints” beats “maximally capable” for most real work. Migration tooling is a near-perfect case study.

The three reasons the biggest model is often the wrong tool

Cost, at the scale this actually runs. A migration codebase is millions of lines, and you don’t scan it once — you re-scan as you work, across a project that can run for months. Route all of that through a top-tier frontier API and the bill becomes a line item somebody senior starts asking about. Now remember the funnel: deterministic rules already handle the known patterns for free, and the LLM only touches the long tail. For that long tail, a well-chosen mid-size coder model running locally handles the large majority of “rewrite this PL/SQL to PL/pgSQL” tasks perfectly well. You spend frontier-model money only where the problem genuinely demands frontier-model judgment, which is a small slice.

Privacy, which for a lot of shops is non-negotiable. Think about what a migration tool reads: your entire proprietary application codebase, your schema, your business logic. For a bank, a healthcare system, a defense contractor, “ship all of that to an external API” isn’t a cost question — it’s a flat policy violation that ends the conversation. A right-sized local model changes the math entirely. Swordfish runs its embeddings on a local ONNX model by default (no network, no key) and its LLM layer is provider-agnostic, so you can point it at a model running on your own hardware. The whole assessment, including the AI tiers, can run inside your network. You can’t do that with a model that only exists behind someone else’s API.

Latency and control. A local or self-hosted model you can run as hard as you want, batch how you like, and depend on without rate limits or a vendor’s uptime. For a long-running migration project, that operational control is worth more than the last few points of capability on a benchmark.

”But will a smaller model actually do the job?”

For this work, mostly yes, and the reason is that migration translation is a constrained task, not an open-ended one. You’re not asking the model to invent; you’re asking it to translate a known construct from one dialect to another, often with the deterministic layer having already identified exactly what the construct is. “Rewrite this CONNECT BY as a recursive CTE” is a far smaller ask than “understand my entire business.” A 30B-class coder model is comfortably capable of the former, and the former is most of what the LLM tier is doing.

Where you do want more capability is the genuinely hard, ambiguous cases — the gnarly dynamically-assembled query, the judgment call on whether a finding is a real problem in context. And that’s exactly where the multi-model validation comes in: reserve the heavier, possibly-frontier model for the uncertain slice, cross-check it, and don’t waste it on the ROWNUM rewrites a local model nails every time. Right-sizing isn’t “always use the small model.” It’s “use the smallest model that does this job well, and escalate deliberately.”

The principle, generalized

Here’s the take I’ll put my name on: in most AI systems, the model is not the hard part and not the right place to spend. The hard part is the data engineering around it — what you feed it, how you’ve narrowed the problem before it gets there, how you verify what comes back. A migration tool that does the deterministic work first, runs local-by-default, and escalates to bigger models only for the cases that earn it will be cheaper, more private, and more trustworthy than one that reflexively pipes everything to the largest available model and prays.

The biggest model is a fine tool. It’s just rarely the only tool, and treating it as the default is how you end up with a migration assessment that’s expensive, can’t run in a regulated environment, and isn’t actually more correct for the trouble. Size the model to the job. Most of these jobs are smaller than the hype wants you to believe.


Swordfish is an open-source (Apache-2.0) assessment harness for migrating Oracle, MySQL, SQL Server, Sybase, and DB2 to PostgreSQL — it shows you what’s in your codebase, what needs to change, and hands scoped tasks to the copilot you already use. Source: github.com/EnterpriseDB/swordfish-migrations