deep-dive
Finding the SQL Nobody Knows They Have
You grepped for `SELECT` and found 200 queries. There are 1,400. This post is about the other 1,200.
The first thing most teams do when they’re sizing a database migration is grep the codebase for SQL. It’s the right instinct and it produces a dangerous number, because grep finds the SQL that looks like SQL and misses the SQL that doesn’t. And in a mature application, most of the database access doesn’t look like a tidy SELECT ... FROM string sitting in a file. It’s assembled, generated, wrapped, and hidden. If your migration plan is built on the grep number, your plan is built on a fraction of the actual work.
So let’s go find where SQL actually hides, and how Swordfish drags it out.
The four hiding places
Embedded string literals: the easy ones. Plain SQL in a string: db.execute("SELECT * FROM orders WHERE status = 3"). Grep finds these. They’re maybe a third of what’s really there. Don’t let finding them give you confidence about the rest.
Dynamically assembled SQL: the nightmare. Query built by concatenation: sql = "SELECT * FROM orders WHERE 1=1"; if (filter) sql += " AND region = " + region;. There is no single string to grep for, because the query doesn’t exist as text until runtime. Grep sees fragments and shrugs. This is also where the worst migration bugs live, because dynamic SQL is where injection-prone, dialect-specific, hard-to-test queries accumulate.
ORM-generated SQL: invisible by design. Your Hibernate, SQLAlchemy, Entity Framework, or Django ORM emits SQL you never wrote and can’t grep, because it’s generated from method calls and model definitions. The ORM is supposed to abstract the database — which is great until you’re changing databases and need to know what it’s actually emitting and whether the generated SQL relies on source-dialect behavior.
SQL behind call-site wrappers. Teams wrap their data access: userRepo.findActive(), a helper that takes a query name, a builder pattern. The SQL is there, but it’s one or two indirections away from anything grep recognizes as a query.
How Swordfish finds what grep can’t
This is tier two of the four-tier funnel, and it’s three techniques working together:
Call-site detection. Instead of looking for SQL, Swordfish looks for the doorways to the database — the methods and APIs through which code talks to a DB. It carries 93 database-access signatures across seven languages: the execute/query/prepare family in each language’s DB drivers and ORMs. Find the call sites and you’ve found where SQL enters the system, including the dynamic and wrapped cases grep walks past. You’re not searching for the query; you’re searching for the door it goes through.
String extraction. At and around those call sites, Swordfish extracts the string arguments and the fragments that get concatenated into them, reconstructing as much of the query as is statically visible. It won’t perfectly resolve a query that’s assembled across ten conditionals at runtime, but it surfaces the fragments and the site, so a human (or the LLM tier) knows exactly where to look.
Dialect fingerprinting. Once it has a candidate query or fragment, the classifier fingerprints the dialect — does this look like Oracle, T-SQL, MySQL? This matters because it tells you not just “here’s a query” but “here’s a query written in the dialect you’re migrating off,” which is the set you actually have to deal with. A generic ANSI query that already works in PostgreSQL is noise; an Oracle-flavored one is work.
What this changes about your estimate
The practical payoff is that your migration estimate stops being a lie. The grep number says “200 queries, a couple weeks.” The real inventory — embedded plus dynamic plus ORM plus wrapped, dialect-fingerprinted so you know which ones are actually source-specific — says something honest, and it’s usually a multiple of the grep number. Better to learn that during assessment, when it’s a planning input, than during migration, when it’s a surprise that blows the timeline.
And it’s the same inventory that makes your coding agent useful, by the way. An agent can’t rewrite the SQL it can’t see any more than you can grep for it. Hand the agent “the 1,400 real call sites, the reconstructed query at each, the dialect of every one” and it can actually work. Hand it the codebase and “find the SQL” and it’ll do exactly as well as your grep did — which is to say, it’ll confidently miss the 1,200 that matter most.
Find the SQL nobody knows they have before you commit to a date. It’s almost always more than you think, it’s hiding in the places that are hardest to migrate, and it’s the difference between an estimate you can defend and one you’ll be apologizing for.
Swordfish is an open-source (Apache-2.0) assessment harness for migrating Oracle, MySQL, SQL Server, Sybase, and DB2 to PostgreSQL — it shows you what’s in your codebase, what needs to change, and hands scoped tasks to the copilot you already use. Source: github.com/EnterpriseDB/swordfish-migrations