Goldfish

essay

A Tour of the Swordfish Sample Apps (and What Breaks in Each)

The hardest part of evaluating a database migration tool is finding something honest to point it at.

Matt Yonkovit · 6 min read

The hardest part of evaluating a database migration tool is finding something honest to point it at. A hand-written demo schema tells you nothing — real migrations don’t fall over on CREATE TABLE, they fall over on the report someone wrote in 2014 that pages with ROWNUM, or the stored procedure that locks two tables and assumes session-scoped semantics that PostgreSQL doesn’t share.

The Swordfish sample apps exist to be that honest target. Each one is a working application carrying the kind of database-specific baggage that turns a “simple” migration into a month of surprises. This is a tour of what’s in the box, app by app, and the sorts of things that break when you aim each at PostgreSQL.

A note before we start: nearly everything below is intentional. Each app ships with an answer key that catalogs every embedded gotcha with a file, a line, and a difficulty rating, so you can scan an app and then score what the tool actually caught.

It all lives in one repo: github.com/EnterpriseDB/swordfish-sample-appsgit clone https://github.com/EnterpriseDB/swordfish-sample-apps.git to follow along.


MySQL

ecommerce-php — a PHP storefront (“ShopEasy”). Product catalog, search, cart, checkout, and an admin panel with reporting. It looks like a thousand small PHP shops that grew up around MySQL.

What breaks on the way to PostgreSQL: full-text search written as MATCH ... AGAINST (PostgreSQL uses tsvector/tsquery and a different index type), REPLACE INTO and INSERT ... ON DUPLICATE KEY UPDATE (PostgreSQL’s ON CONFLICT works differently), string-based advisory locks via GET_LOCK()/RELEASE_LOCK(), LOCK TABLES for a batch report, ENUM and SET columns, ON UPDATE CURRENT_TIMESTAMP, and a data layer built entirely on the mysqli driver and mysqli_multi_query — none of which has a drop-in PostgreSQL equivalent.

api-node — a Node.js/Express social API (“MobileConnect”). Users, posts, messaging, notifications, media, and “find people near me.” The interesting parts cluster around location and bulk operations.

What breaks: a POINT column with a SPATIAL INDEX and ST_GeomFromText(..., 4326) (PostgreSQL leans on PostGIS, with a GEOGRAPHY type and GiST indexes — and, famously, the opposite latitude/longitude axis order), ST_Distance_Sphere() for nearby queries, LOAD DATA INFILE for bulk import (PostgreSQL uses COPY, with a completely different file-access model), a STORED generated column for full names, TINYINT(1) standing in for boolean, and the same mysqli/CALL-a-stored-procedure plumbing as the storefront.


SQL Server

hr-dotnet — a C#/.NET 8 HR and payroll system. Employees, an org chart, payroll runs, benefits, time tracking. Classic enterprise .NET on top of T-SQL.

What breaks: @@IDENTITY for insert IDs, ISNULL() everywhere instead of COALESCE, TOP n without an ORDER BY, WITH (NOLOCK) and WITH (TABLOCKX) hints, string aggregation done the old way with STUFF() + FOR XML PATH(''), CROSS APPLY/OUTER APPLY, FORMAT() with .NET format strings, CONVERT() with numeric style codes, TRY...CATCH with ERROR_NUMBER()/ERROR_MESSAGE(), #temp and ##global temp tables, plus schema features like HIERARCHYID for the org chart, computed columns, and row-level security predicate functions.

etl-python — a Python data-warehouse ETL pipeline. Pulls from staging tables into a star schema, with slowly-changing dimensions and indexed-view aggregates. The SQL Server gotcha list from hr-dotnet, plus the ones that show up specifically in warehousing.

What breaks (on top of the SQL Server staples above): BULK INSERT of CSVs, MERGE for SCD Type 2 dimension loads, indexed views that need specific SET options, a partition function and scheme on the fact table, and table variables (DECLARE @t TABLE). It’s the app to reach for if you want to see how a tool handles ETL rather than OLTP.


Sybase ASE

Sybase shares T-SQL’s roots with SQL Server, so a lot looks familiar — until the semantics diverge in ways that don’t throw errors, they just behave differently.

trading-java — a Java trade execution and settlement system. Trades, positions, settlement runs, daily P&L.

What breaks: @@rowcount and @@identity checked after nearly every statement, HOLDLOCK/NOHOLDLOCK hints, COMPUTE BY for break-report subtotals (there’s no PostgreSQL equivalent — it has to become window functions or app-side rollups), SET ROWCOUNT n to cap affected rows, nested transactions where an inner COMMIT is a no-op because of @@trancount, WAITFOR DELAY, and RAISERROR with Sybase’s severity/state syntax. The driver layer uses jConnect JDBC.

risk-python — a Python risk-reporting engine. Value-at-Risk, stress tests, limit monitoring, regulatory snapshots. Same Sybase dialect, pointed at quant workloads, plus UNITEXT large-text columns that need careful handling on the way out.


IBM DB2

DB2 is standards-adjacent in places, which is its own trap: the syntax is close to PostgreSQL, so the differences are easy to miss.

claims-java — a Java insurance-claims processor. Intake, adjudication, payments, fraud checks, regulatory reporting, with materialized query tables (MQTs) behind the reports.

What breaks: FETCH FIRST n ROWS ONLY paired with OPTIMIZE FOR n ROWS, VALUES used as a standalone statement, NEXT VALUE FOR sequences, isolation hints baked into queries (WITH UR/CS/RS/RR), DB2’s own MERGE, LATERAL joins, OLAP window-function extensions, SQL PL BEGIN ATOMIC compound statements, DECLARE GLOBAL TEMPORARY TABLE, error handling via SQLSTATE/SQLCODE with DECLARE HANDLER, GET DIAGNOSTICS, SIGNAL SQLSTATE, and LOCK TABLE ... IN EXCLUSIVE MODE.

actuarial-python — a Python actuarial analytics platform. Mortality analysis, loss ratios, IBNR reserve estimation, regulatory filings. The DB2 SQL/PL list above, plus the actuarial wrinkle that matters most: DECFLOAT precision used throughout to avoid floating-point rounding, recursive CTEs with SEARCH/CYCLE clauses, and bulk loads via ADMIN_CMD.


Oracle

mega-app — a multi-language order-management system. This is the big one: a Java CLI and web app, plus Go, Python, and PHP microservices, all over a single Oracle schema with 96+ database objects. It’s the closest thing here to a sprawling real-world Oracle estate.

What breaks — and there’s a lot: Oracle Object Types with methods, CONNECT BY/SYS_CONNECT_BY_PATH hierarchical queries (PostgreSQL needs recursive CTEs), ROWNUM pagination, BINARY_DOUBLE/BINARY_FLOAT types, XMLType columns and XML processing, SYS_GUID(), the NVL/NVL2/DECODE family instead of COALESCE/CASE, Oracle’s MERGE, package dependencies like DBMS_OUTPUT/DBMS_LOB, autonomous transactions (PRAGMA AUTONOMOUS_TRANSACTION), sequences accessed via .NEXTVAL/.CURRVAL, INSTEAD OF triggers on views, materialized views, and PL/SQL packages with pipelined functions. The multi-language layer also lets you see how the same Oracle-isms surface across four different driver stacks.


The OSS clones

The purpose-built apps are dense by design. The frozen open-source clones are the reality check — real projects, cloned at fixed commits, with all the mess that real projects carry.

  • DNN Platform (SQL Server) — hundreds of stored procedures with genuine business logic, dynamic SQL, cursors, and XML handling.
  • WideWorldImporters (SQL Server) — Microsoft’s “kitchen sink” sample: temporal tables, CLR, columnstore, graph, in-memory OLTP, JSON, spatial, full-text, row-level security, and dynamic data masking.
  • Magento 2 (MySQL) — 300+ tables and the notorious EAV schema, with MySQL-specific SQL buried deep in the codebase.
  • ERPNext (MySQL) — a full ERP whose “database-agnostic” framework still leaks plenty of MySQL-isms.
  • Alexandria PL/SQL Utils (Oracle) — 20+ PL/SQL packages built on DBMS_*/UTL_*, the dependencies that are hardest to port.
  • iDempiere (Oracle and PostgreSQL) — the most useful one for grading a tool: it ships both an Oracle and a PostgreSQL implementation, so you can compare a tool’s output against a real, human-made migration (NVLCOALESCE, SYSDATECURRENT_DATE, CONNECT BY→recursive CTE, (+) outer joins→LEFT JOIN, and so on).

Running a test

The shortest path: pick an app, start Swordfish, create a project with the app’s source database and a PostgreSQL target, and add the app’s code or its sql/ DDL files as an input — no running database required for either. Run the scan, then open the matching answer key under docs/answer-keys/ and tally what the tool found against what’s documented. If you want to test the live-database path too, the MySQL, SQL Server, and Oracle apps ship with Docker setups; Sybase and DB2 need licensed images you bring yourself.

Whichever app you start with, the question it’s built to answer is the same: when the tool meets the weird, real stuff, how much does it actually catch?