We’re lucky to have two great Open Source databases, MySQL and Postgresql.
One of the killer features in both MySQL and Postgresql is table partitions – for example, most Silicon Valley adtech companies are powered by MySQL partitions.
They let enterprises, and growing startups, easily manage large volumes of data.
- drop millions of rows without purge thread or VACUUM workload, esp. useful for time-series (logging) and retention compliance requirements
- save storage space by implying the key in the table name
- allow transportable tablespaces for archiving and repair or storage management
- allow smaller table scans and index lookups by isolating the row range
- potentially allow “parallel query execution” (I believe Oracle has a patent on this, but eventually patents expire.)
- potentially allow parallel writes across partitions. I believe this already works in recent versions of MySQL/InnoDB and since Postgresql 8.0.
- syntax restrictions on secondary keys, triggers, etc. so read the manual first
- pgloader does not support CREATE PARTITION as of Jan. 27, 2018. (The author is looking for sponsorship however.)
Simpler syntax for Postgresql 10, known as declarative partitioning, has been announced that is similar to MySQL’s syntax.
Here’s a chart comparing partitions across database products:
|Table Partition Syntax
The MySQL bug database lists many serious partition bugs in 5.0 and 5.1, so it’s important to use the latest version possible. MySQL also supports merge tables for MyISAM tables.
Chapter 15 of “PostgreSQL 9 High Performance Book Review” covers Postgresql 9 partitions.
PostgreSQL Partition Manager Extension (pg_partman), Blog, github
Yahoo MySQL Partition Manager, Blog
PostgreSQL 10: Partitions of… partitions! HN Discussion
wiki.postgresql.org: Table partitioning
Creating partitions automatically in PostgreSQL
rhaas.blogspot.com: Plans for Partitioning in v11
reddit: What are some bad things about PostgreSQL?
PostgreSQL 10 – table partitioning – how to check partitions and manipulate with them
The last 747 has been retired from passenger service in the USA.
I’ll miss the 747 because it was the only widely-used airliner built for trans-oceanic flights:
- For safety reasons, I’d rather have 4 engines than 2 for trans-oceanic flights. The slow acceptance of twin-engined airliners under ETOPS over the past few decades means greater fuel efficiency for airlines, but in a pinch you’re left with one engine, which is not a good situation for passengers.
- The 777’s that I’ve flown on to Asia don’t have eyeball air vents, so I end up roasting on long flights.
Delta Retires Last USA 747 to Boneyard
Malaysia hunts owners of Three Boeing 747s abandoned at airport
This is a multi-part series starting with Part 1.
After perfecting the schema and data migration with pgloader rules and getting the application sessions and login code working, it was time to convert the non-working SQL statements to Postgres syntax.
Here’s the SQL syntax I had to change (note: I use placeholders (?) whenever possible):
||FLOOR(EXTRACT(EPOCH FROM ?))
|DATE_ADD(?, INTERVAL ? unit)
||? + INTERVAL ‘$x timeunit’
||Perl DBI can do placeholder with ?::interval and “$x timeunit”
|DATE_SUB(?, INTERVAL ? unit)
||? – INTERVAL ‘$x timeunit’
||see above for interval placeholder info
|? = 0 or ‘0000-00-00 00:00:00’
||? IS NULL
||in the case when the action is not important. PG 9.5: INSERT … ON DUPLICATE KEY IGNORE
||in the case when rows are immutable (never updated). PG 9.5: INSERT … ON DUPLICATE KEY UPDATE
||CASE WHEN … THEN … ELSE … END
|LIMIT ?, ?
||LIMIT ? OFFSET ?
||can use the pg syntax for both databases
||Postgresql is stricter about the columns list here
|ORDER BY NULL
||MySQL optimization only
|SQL_CALC_ … FOUND_ROWS
||MySQL optimization only
||EXTRACT(HOUR FROM ?)
||Postgresql has public.user, so treat the user table name as a reserved symbol
|UPDATE table1, table2 …
||UPDATE table1 … WHERE id IN (SELECT id FROM table2 WHERE …)
|last insert id (various)
Don’t forget to:
- ensure string lengths don’t exceed your schema column widths, and that Postgresql char() space-padding isn’t an issue for your application code
- verify how Postgresql dates use timezones
- run EXPLAIN on each of your statements.
Perl CGI::Session Notes
pgloader migrates the MySQL CGI::Session table using a Postgresql text column. This works better:
alter table sessions alter a_session type bytea using a_session::bytea;
Being famous is not always a good thing …
There’s at least 5 problems related to the on-going Meltdown and Spectre
serious CPU security bugs (AWS announcement)
that impact the Database Administrator (DBA):
- in shared environments, like AWS or VMs, neighbour VMs can read/write your data on unpatched systems. A privacy solution is to provision the entire server to yourself.
- forthcoming patches might work, or not. Complex security patches often don’t address the issue initially, so there will be a sequence of related patches (whack-a-mole, like Shellshock) that will affect database uptime and cache performance. Say good-bye to your 400-day uptimes!
- the patches are reported to consume more memory and reduce system performance. If your database server is configured, like with MySQL’s
innodb_buffer_pool_size, to use 90% of RAM you should consider 80% or 75% to avoid OOMs.
- in AWS, significant clock skew has been reported, so add that to your monitoring.
innodb_buffer_pool_size can be set dynamically in MySQL 5.7 with some caveats:
SET GLOBAL innodb_buffer_pool_size=4G;
The above applies doubly to server consolidation and microservices in VMs.
Of course, if you’re an experienced production DBA, then you never trusted VMs anyway.
Some numbers from Redhat (paywalled):
> Measureable: 8-12% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-12%. Examples include Oracle OLTP (tpm), MariaBD (sysbench), Postgres(pgbench), netperf (< 256 byte), fio (random IO to NvME).
>Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measureable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005 w/ucode and SQLserver, and MongoDB.
Redis: Meltdown fix impact on Redis performances in virtualized environments
Cassandra: Meltdown/Spectre Linux patch – Performance impact on Cassandra?
I’ll leave it to others to pontificate on what it means when you can’t trust any desktop, server or mobile computer in an Internet-connected world. Or what HIPAA compliance means in the cloud where your server is a party-line telephone.
forums.aws.amazon.com: Degraded performance after forced reboot due to AWS instance maintenance , HN
ARM: Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
Escaping Docker container using waitid() – CVE-2017-5123
theregister.co.uk: Azure VMs borked following Meltdown patch, er, meltdown
CPU hardware vulnerable to side-channel attacks (Replace CPU hardware), HN (I called this in advance, but there needs to be two steps: re-design CPUs in 2018 if there’s no possible microcode update, then replace them in 2019)
blog.appoptics.com: Visualizing Meltdown on AWS
Intel alerted computer makers to chip flaws on Nov 29 – new claim – Total coincidence: That’s the same day Chipzilla’s CEO sold off his shares