Goldfish

opinion

Why Database Migrations Are a Developer Problem

The schema is the easy part. The code is where migrations go to die.

Matt Yonkovit · 8 min read

I’ve watched this movie maybe fifty times now, across companies you’ve heard of and a few you haven’t. It always opens the same way. Some VP decides the org is getting off Oracle (or Sybase, or that SQL Server cluster nobody’s patched since the Obama administration). A DBA gets handed “the migration.” A date goes on a slide. Everybody nods.

And then six weeks in, somebody finally opens the application repo.

That’s the moment the timeline quietly doubles. Because the database (the schema, the tables, the indexes, the constraints) was never the hard part. We’ve had decent tools for converting DDL for twenty years. ora2pg exists. The type mappings are mostly known. You can move the shape of the data in a weekend if that’s all there is.

It’s never all there is. The migration isn’t a database project wearing a database costume. It’s a code project, and the code is full of landmines that no schema converter will ever see.

The iceberg nobody drew on the slide

This is the part that makes DBAs and developers point fingers at each other: the schema is maybe 20% of the work, and it’s the 20% that’s visible. It’s in the data dictionary. You can query it. A tool can read every table and every column and tell you, deterministically, “this NUMBER(10) becomes a bigint.”

The other 80% is application code, and it does not live in the data dictionary. It lives in:

  • Embedded SQL scattered across Java, Python, PHP, Go, and that one C# service the original author left the company over.
  • ORM-generated queries where the SQL doesn’t even exist as text until runtime, so nobody can grep for it.
  • Stored procedures and packages that encode actual business rules (pricing logic, eligibility checks, the reason orders get rejected) in PL/SQL or T-SQL that has to be rewritten by hand.
  • Dynamic SQL, the worst offender, where queries get built by string concatenation based on user input, feature flags, and the phase of the moon.

You cannot convert what you cannot find. And most teams cannot find their own SQL. I’m not being insulting. I’ve run the numbers with customers, and the gap between “SQL the team thinks they have” and “SQL they actually have” is routinely 30-40%. The dynamic stuff, the ORM stuff, the report query someone hardcoded into a cron job in 2014 — it’s all real, it all runs in production, and it all has to work after you move.

The dialect trap: it compiles, then it corrupts

Say you find all the SQL. Congratulations, you’re ahead of most teams. Now you have a worse problem, because a lot of it will port without complaining and then behave differently.

This is the category that keeps me up at night, and it should keep you up too. A few of my favorites:

  • Oracle thinks the empty string is NULL. Postgres does not. So every WHERE comment = '' and every NVL(notes, 'none') that quietly leaned on that behavior now does something subtly different. No syntax error. No failed test, unless you happened to write the test for exactly that case. Just wrong answers, three weeks after go-live, in the reporting that finance trusts.
  • MySQL’s default collation is case-insensitive. Your login lookup, your “is this email already registered” check, your dedup job — all of them assumed 'Matt' = 'matt'. Postgres says those are different strings. Surprise: duplicate accounts.
  • SQL Server’s LEN() ignores trailing spaces. Postgres length() doesn’t. If any logic compares lengths, you just changed the answer.
  • Integer division, NULL propagation in aggregates, date math, implicit casts. Every engine has its own opinions, and your code has been quietly depending on a specific engine’s opinions for a decade.

None of this shows up in a DDL diff. All of it shows up in production. The schema converter said “done” and it was lying. Not on purpose; it just literally cannot see behavior. It sees structure.

The stuff that isn’t in the code at all

Now we get to the genuinely hard part, the part I think the whole industry underestimates: a huge amount of what your application “knows” was never written down anywhere a machine can read it.

I call it inferred knowledge. You’ve seen it. You’ve written it.

-- somewhere in a 4,000-line stored procedure
WHERE order_status != 3
  AND region_id IN (1, 2, 7)

What’s 3? It’s “cancelled.” Everyone on the team knows it’s cancelled. There’s no enum, no lookup table, no comment. It’s cancelled because Dave said so in a hallway in 2011 and Dave is at a different company now. What are regions 1, 2, and 7? Domestic, near-shore, and “the weird one we acquired.” Why is region 4 missing? Nobody remembers, and the one time someone “cleaned that up” they broke tax reporting for a quarter.

That’s not a syntax problem. That’s not even a SQL problem. That’s organizational memory encoded as magic numbers, and it does not survive contact with a migration unless a human who understands the business validates it.

It gets deeper. Code relies on implicit row ordering that happened to be stable on the old engine and isn’t guaranteed on the new one. It relies on a specific transaction isolation level because someone hit a race condition in 2015 and bumped it and never documented why. It relies on a batch job running at 2am before another one, an ordering enforced by nothing but cron and prayer. None of that is in the schema. Most of it isn’t even in the code — it’s in the gaps between the code, in the assumptions, in the tribal knowledge that walks out the door every time someone resigns.

”Just point an AI at it” — and why the bare-bones version face-plants

So here’s where someone in the room says: it’s 2026, just hand the whole thing to a coding agent and let it rip.

Look, I’m the last person who’s going to tell you AI can’t help here. It can. It’s genuinely good at the mechanical translation. Give it a chunk of PL/SQL and it’ll hand back respectable PL/pgSQL. I use these tools every day. That’s not the argument.

The argument is that a bare-bones agent, pointed at a legacy codebase with no context, will confidently get the dangerous parts wrong, and worse, it’ll be wrong in ways that look right. Here’s why:

  1. It can’t see what it can’t see. The agent works on the files you give it. The dynamic SQL assembled across three classes? The query the ORM generates at runtime? The proc that’s called from four services with four different assumptions? An agent reading one file at a time has no map of any of that. It doesn’t know what’s actually used, what’s dead, or what’s load-bearing.

  2. Your business is a black box to it. Hand it WHERE order_status != 3 and it’ll faithfully translate != to <> and call it a day. It has no idea that 3 is cancelled, that there’s a fifth status nobody uses anymore, or that the real requirement was “exclude cancelled and refunded” and the original code had a bug. It can’t ask Dave. Dave’s gone.

  3. It papers over behavioral traps instead of flagging them. Ask a model to convert NVL(x, 'n/a') and it’ll hand you COALESCE(x, 'n/a'), which is correct syntax and wrong behavior the moment empty-strings-as-NULL enters the picture. The model produced something plausible. Plausible is exactly the failure mode that makes it past code review.

  4. Fluent guessing, zero doubt. When the agent doesn’t know, it doesn’t stop and say “I don’t know.” It guesses, fluently, in well-formatted code. And a fluent guess in a 4,000-line stored procedure is so much more expensive to catch than a syntax error, because nothing turns red. You find it when the numbers don’t add up.

The result is a special kind of migration hell: the demo works, the happy path works, and then the long tail of inferred knowledge and behavioral drift bleeds out over months. You’ve traded a known amount of upfront work for an unknown amount of debugging in production. That’s a bad trade, and “the AI did it” is not a sentence you want to say to the CFO when the revenue report is off by 4%.

So what do you actually do?

I’m not here to tell you to do it all by hand. That’s its own kind of insane, and honestly nobody has the budget or the patience. The hand-translation-everything approach is how migrations end up on year three.

Here’s my actual take, after watching enough of these: the leverage isn’t in automating the work, it’s in seeing the work clearly before you start. You can’t plan, estimate, or safely delegate a migration you can’t see. The teams that succeed are the ones that, before writing a single line of converted code, can answer:

  • Where is all my SQL, including the dynamic and ORM-generated stuff?
  • Which of it ports clean, which needs a rewrite, and which is a behavioral trap that’ll pass tests and lie to me?
  • What’s the inferred knowledge (the magic numbers, the ordering assumptions, the isolation-level hacks) that a human has to confirm because no machine can?
  • How big is this, really, so I can give a number I won’t have to walk back?

Get that map, and everything downstream gets better, including your AI tools, because now you can hand your copilot a scoped, contextual task instead of “here’s 800,000 lines, good luck.” The agent stops guessing because you stopped making it guess.

That map is exactly the thing we built Swordfish to produce, and it’s why we built it as a helper for your existing tools instead of a black box that claims to do the whole thing. But that’s the next post.

For now, do me one favor. Go open your application repo (not the database, the app) and try to find every place it talks to the database. Grep for the obvious stuff. Then go look at the ORM. Then the dynamic SQL. Then ask yourself how many magic numbers are sitting in your WHERE clauses with nobody left who remembers what they mean.

I’ll bet it’s more than you think. It always is. That gap, right there, is why your migration is a developer problem — and why it’s going to take longer than the slide says.


Swordfish is an open-source (Apache-2.0) assessment harness for migrating Oracle, MySQL, SQL Server, Sybase, and DB2 to PostgreSQL — it shows you what’s in your codebase, what needs to change, and hands scoped tasks to the copilot you already use. Source: github.com/EnterpriseDB/swordfish-migrations