Part 1: Migration Notes from MySQL to Postgresql Using pgloader

Recently I migrated a small but non-trivial (25 tables, about 500 columns, no triggers or SPs) MySQL schema to Postgresql using the Open Source pgloader utility.

pgloader supports migration from several databases/formats (MySQL, Sqlite, MS SQL, dBase, CSV) to Postgresql.

pgloader is fast: it took about 20 seconds to migrate/load both the schema and data into Postgresql 9.2 on my Mac notebook running in Virtualbox. The total number of rows was about 3 million narrow rows.

The impressive speed of pgloader lets you iterate quickly when writing pgloader rules, and lets you embed the loading process into automated scripts without any concern about performance.

Executive Summary

pgloader is a free scriptable ETL tool that lets you quickly migrate database schemas and data to Postgresql. It is helpful for setting up a PoC to estimate the work required for a full migration analysis. (It does not do SQL syntax or application code migration.) Although pgloader is excellent at what it does, any database migration is a major project.

Installation on CentOS 7

yum install freetds postgresql
createdb mydb
# unpack and run the build script from github or use the Docker image
make pgloader
pgloader mysql://root@localhost/mydb postgresql:///mydb

Installation on Debian or Ubuntu

apt-get install pgloader postgresql
createdb mydb
pgloader mysql://root@localhost/mydb postgresql:///mydb

pgloader pluses:

– free
– fast
– 100% scriptable and customizable with pgloader rules (called a “load file”)
– reasonable result for a first pass even without custom pgloader rules
– “pro DBA” feeling – accepts configuration file, and emits log files, similar to the behavior of Oracle’s BCP utility.

pgloader minuses:

– not included with Postgres
– unusual in that it is written in Lisp, but that is not user-visible
– doesn’t create Postgresql users/roles (to be expected as the SQL standard doesn’t specify these, thus they vary greatly across databases)
– doesn’t convert MySQL partitions to Postgresql inheritance-syntax partitions in 8.x or 9.x, but should work with declarative-syntax partitions in 10.x+
– detects and double-quotes reserved object names, but doesn’t notify of schema name conflicts with the Postgresql public namespace ie. table name ‘user’
– will require time-consuming schema cleanup for most use cases. See below.

My pgloader project files are available for download from my github.

Schema/Data Cleanup Notes

  • MySQL timestamps do not display the tz, but by default pg shows … “+00” unless you specify “timestamp without time zone”, or you use EXTRACT() or TO_CHAR(). Read about related pgloader rules here.
  • MySQL text and varchar columns can be migrated with varying results to pg text and bytea types
  • treat the table name ‘user’ as reserved since there is a public.user symbol that overrides the search path
  • don’t underestimate how long schema cleanup will take. Although pgloader runs quickly, Postgresql does not do casting automatically, so is extremely sensitive to application SQL statements
  • MySQL and Postgresql have different models for representing users/roles and timezones that need to be dealt with sooner than later. Here is some advice on timezone setting: Adding timezone to naive datetime fields from MySQL #331
  • migrating applications from MySQL to Postgresql is easier with Postgresql 9.5 since it has INSERT … ON CONFLICT DO UPDATE (UPSERT.)
  • index names are table-specific in MySQL, but schema-wide in Postgresql, so by default pgloader names them idx_NNNNN_name to uniqify them. If you need to use named indexes, like with ON CONFLICT ON CONSTRAINT in 9.5, then you either need to uniqify the index names in MySQL and do pgloader –with “preserve index names”, or add a pgloader rule to do CREATE INDEX your_index_name.

My Migration Results

After renaming the table ‘user’ to ‘users’ and altering the sessions table (see above), around half of the SELECT queries worked as-is and I could login to the application and click around. 🙂

However, virtually all of the INSERT and UPDATE statements had to be rewritten, taking 2 man days to get to an alpha version and 2 weeks for something worth doing formal QA.

Database Migration Alternatives

Amazon AWS provides 3 powerful data migration tools under the AWS Database Migration Service banner that are either free to use, or have a 6 month trial.

Click here to read Part 2.

pgloader: Homepage, github, Manual, Licence
postgresql.org: pg_dump
http://rhaas.blogspot.com: The State of VACUUM
severalnines.com: Upgrading Your Database to PostgreSQL Version 10 – What You Should Know

This entry was posted in MySQL, Open Source, Postgresql, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.