Table Partitions in MySQL and Postgresql

We’re lucky to have two great Open Source databases, MySQL and Postgresql.

One of the killer features in both MySQL and Postgresql is table partitions – for example, most Silicon Valley adtech companies are powered by MySQL partitions.

They let enterprises, and growing startups, easily manage large volumes of data.

Pluses

  1. drop millions of rows without purge thread or VACUUM workload, esp. useful for time-series (logging) and retention compliance requirements
  2. save storage space by implying the key in the table name
  3. allow transportable tablespaces for archiving and repair or storage management
  4. allow smaller table scans and index lookups by isolating the row range
  5. potentially allow “parallel query execution” (I believe Oracle has a patent on this, but eventually patents expire.)
  6. potentially allow parallel writes across partitions. I believe this already works in recent versions of MySQL/InnoDB and since Postgresql 8.0.

Minuses

  1. syntax restrictions on secondary keys, triggers, etc. so read the manual first
  2. pgloader does not support CREATE PARTITION as of Jan. 27, 2018. (The author is looking for sponsorship however.)

Simpler syntax for Postgresql 10, known as declarative partitioning, has been announced that is similar to MySQL’s syntax.

Here’s a chart comparing partitions across database products:

Feature MySQL 5.0+ Postgresql 8.0+ Postgresql 10.0 Postgresql 11.0
Table Partition Syntax 5.7 9.6 10 TBD

The MySQL bug database lists many serious partition bugs in 5.0 and 5.1, so it’s important to use the latest version possible. MySQL also supports merge tables for MyISAM tables.

Chapter 15 of “PostgreSQL 9 High Performance Book Review” covers Postgresql 9 partitions.

PostgreSQL Partition Manager Extension (pg_partman), Blog, github
Yahoo MySQL Partition Manager, Blog
PalominoDB pdb-parted
PostgreSQL 10: Partitions of… partitions! HN Discussion
wiki.postgresql.org: Table partitioning
Creating partitions automatically in PostgreSQL
rhaas.blogspot.com: Plans for Partitioning in v11
reddit: What are some bad things about PostgreSQL?
PostgreSQL 10 – table partitioning – how to check partitions and manipulate with them

Posted in MySQL, MySQL Cluster, Open Source, Oracle, Postgresql, Tech | Leave a comment

Postgresql Concepts for MySQL Users

Uniqueness in PostgreSQL: Constraints versus Indexes
Avoid naming a constraint directly when using ON CONFLICT DO UPDATE
PostgreSQL Upsert Using INSERT ON CONFLICT statement
How to install PostgreSQL 9.5 on CentOS 7

Alias for table name in SQL insert statement
Postgres 9.5 feature highlight – Upsert
psql: FATAL: database “” does not exist
psql: FATAL: Ident authentication failed for user “username” Error and Solution
Postgresql Repos: 9.5, 9.6
Postgresql Sample Databases
Postgres.app
brew install postgresql@9.6

Posted in Open Source, Postgresql, Tech | Leave a comment

Last USA 747 Passenger Airplane Retired

The last 747 has been retired from passenger service in the USA.

I’ll miss the 747 because it was the only widely-used airliner built for trans-oceanic flights:

  1. For safety reasons, I’d rather have 4 engines than 2 for trans-oceanic flights. The slow acceptance of twin-engined airliners under ETOPS over the past few decades means greater fuel efficiency for airlines, but in a pinch you’re left with one engine, which is not a good situation for passengers.
  2. The 777’s that I’ve flown on to Asia don’t have eyeball air vents, so I end up roasting on long flights.

Delta Retires Last USA 747 to Boneyard

Malaysia hunts owners of Three Boeing 747s abandoned at airport

Posted in Tech | Leave a comment

Part 2: Migration Notes from MySQL to Postgresql Using pgloader

This is a multi-part series starting with Part 1.

After perfecting the schema and data migration with pgloader rules and getting the application sessions and login code working, it was time to convert the non-working SQL statements to Postgres syntax.

Here’s the SQL syntax I had to change (note: I use placeholders (?) whenever possible):

MySQL Postgresql 9.1 Notes/PG 9.5
UNIX_TIMESTAMP(?) FLOOR(EXTRACT(EPOCH FROM ?))
FROM_UNIXTIME(?) TO_TIMESTAMP(?)
DATE_ADD(?, INTERVAL ? unit) ? + INTERVAL ‘$x timeunit’ Perl DBI can do placeholder with ?::interval and “$x timeunit”
DATE_SUB(?, INTERVAL ? unit) ? – INTERVAL ‘$x timeunit’ see above for interval placeholder info
? = 0 or ‘0000-00-00 00:00:00’ ? IS NULL
RAND() RANDOM()
INSERT IGNORE INSERT in the case when the action is not important. PG 9.5: INSERT … ON DUPLICATE KEY IGNORE
REPLACE INTO custom code in the case when rows are immutable (never updated). PG 9.5: INSERT … ON DUPLICATE KEY UPDATE
IF() CASE WHEN … THEN … ELSE … END
LIMIT ?, ? LIMIT ? OFFSET ? can use the pg syntax for both databases :)
GROUP BY GROUP BY Postgresql is stricter about the columns list here
ORDER BY NULL MySQL optimization only
SQL_CALC_ … FOUND_ROWS SELECT COUNT(*) MySQL optimization only
DATE(?) TO_CHAR(?, ‘YYYY-MM-DD’)
HOUR(?) EXTRACT(HOUR FROM ?)
table `user` table users Postgresql has public.user, so treat the user table name as a reserved symbol
UPDATE table1, table2 … UPDATE table1 … WHERE id IN (SELECT id FROM table2 WHERE …)
last insert id (various) RETURNING

Don’t forget to:

  • ensure string lengths don’t exceed your schema column widths, and that Postgresql char() space-padding isn’t an issue for your application code
  • verify how Postgresql dates use timezones
  • run EXPLAIN on each of your statements. :)

Perl CGI::Session Notes

pgloader migrates the MySQL CGI::Session table using a Postgresql text column. This works better:

alter table sessions alter a_session type bytea using a_session::bytea;
Posted in Open Source, Postgresql, Tech | Leave a comment

PSA: Intel and AMD Security Bugs and the DBA


CNN.com homepage featuring Meltdown and Spectre
Being famous is not always a good thing …

There’s at least 5 problems related to the on-going Meltdown and Spectre serious CPU security bugs (AWS announcement) that impact the Database Administrator (DBA):

  1. in shared environments, like AWS or VMs, neighbour VMs can read/write your data on unpatched systems. A privacy solution is to provision the entire server to yourself.
  2. forthcoming patches might work, or not. Complex security patches often don’t address the issue initially, so there will be a sequence of related patches (whack-a-mole, like Shellshock) that will affect database uptime and cache performance. Say good-bye to your 400-day uptimes!
  3. the patches are reported to consume more memory and reduce system performance. If your database server is configured, like with MySQL’s innodb_buffer_pool_size, to use 90% of RAM you should consider 80% or 75% to avoid OOMs.
  4. in AWS, significant clock skew has been reported, so add that to your monitoring.
  5. there are Javascript exploits to read your notebook. That means if you connect to a remote database server with a database client or monitoring program from your notebook, your credentials can be read/changed. So keep your notebook OS and browser(s) up-to-date.

Note: innodb_buffer_pool_size can be set dynamically in MySQL 5.7 with some caveats:

SET GLOBAL innodb_buffer_pool_size=4G;

The above applies doubly to server consolidation and microservices in VMs.

Of course, if you’re an experienced production DBA, then you never trusted VMs anyway. :)

Some numbers from Redhat (paywalled):

> Measureable: 8-12% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-12%. Examples include Oracle OLTP (tpm), MariaBD (sysbench), Postgres(pgbench), netperf (< 256 byte), fio (random IO to NvME).

>Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measureable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005 w/ucode and SQLserver, and MongoDB.

Redis: Meltdown fix impact on Redis performances in virtualized environments
Cassandra: Meltdown/Spectre Linux patch – Performance impact on Cassandra?

I’ll leave it to others to pontificate on what it means when you can’t trust any desktop, server or mobile computer in an Internet-connected world. Or what HIPAA compliance means in the cloud where your server is a party-line telephone.

forums.aws.amazon.com: Degraded performance after forced reboot due to AWS instance maintenance , HN
ARM: Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
Escaping Docker container using waitid() – CVE-2017-5123
theregister.co.uk: Azure VMs borked following Meltdown patch, er, meltdown
CPU hardware vulnerable to side-channel attacks (Replace CPU hardware), HN (I called this in advance, but there needs to be two steps: re-design CPUs in 2018 if there’s no possible microcode update, then replace them in 2019)
blog.appoptics.com: Visualizing Meltdown on AWS
Intel alerted computer makers to chip flaws on Nov 29 – new claim – Total coincidence: That’s the same day Chipzilla’s CEO sold off his shares

Posted in Microservices, MySQL, MySQL Cluster, Tech | Leave a comment