Being famous is not always a good thing …
There’s at least 5 problems related to the on-going Meltdown and Spectre serious CPU security bugs (AWS announcement) that impact the Database Administrator (DBA):
- in shared environments, like AWS or VMs, neighbour VMs can read/write your data on unpatched systems. A privacy solution is to provision the entire server to yourself.
- forthcoming patches might work, or not. Complex security patches often don’t address the issue initially, so there will be a sequence of related patches (whack-a-mole, like Shellshock) that will affect database uptime and cache performance. AWS has revised the related announcement page more than 12 times in 2018. Say good-bye to your 400-day uptimes!
- the patches are reported to consume more memory and reduce benchmark performance by 33% on Linux 4.2.0 on Intel processors. If your database server is configured, like with MySQL’s
innodb_buffer_pool_size, to use 90% of RAM you should consider 80% or 75% to avoid OOMs.
- in AWS, significant clock skew has been reported, so add that to your monitoring.
innodb_buffer_pool_size can be set dynamically in MySQL 5.7 with some caveats:
SET GLOBAL innodb_buffer_pool_size=4G;
The above applies doubly to server consolidation and microservices in VMs.
Of course, if you’re an experienced production DBA, then you never trusted VMs anyway. 🙂
> Measureable: 8-12% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-12%. Examples include Oracle OLTP (tpm), MariaBD (sysbench), Postgres(pgbench), netperf (< 256 byte), fio (random IO to NvME).
>Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measureable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005 w/ucode and SQLserver, and MongoDB.
I’ll leave it to others to pontificate on what it means when you can’t trust any desktop, server or mobile computer in an Internet-connected world. Or what HIPAA compliance means in the cloud where your server is a party-line telephone.
forums.aws.amazon.com: Degraded performance after forced reboot due to AWS instance maintenance , HN
ARM: Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
Escaping Docker container using waitid() – CVE-2017-5123
theregister.co.uk: Azure VMs borked following Meltdown patch, er, meltdown
CPU hardware vulnerable to side-channel attacks (Replace CPU hardware), HN (I called this in advance, but there needs to be two steps: re-design CPUs in 2018 if there’s no possible microcode update, then replace them in 2019)
blog.appoptics.com: Visualizing Meltdown on AWS
Intel alerted computer makers to chip flaws on Nov 29 – new claim – Total coincidence: That’s the same day Chipzilla’s CEO sold off his shares
zdnet.com: Researchers discover seven new Meltdown and Spectre attacks HN discussion
phoronix.com: Bisected: The Unfortunate Reason Linux 4.20 Is Running Slower HN
aws.amazon.com: Processor Speculative Execution Research Disclosure
forums.aws.amazon.com: Spectre/Meltdown Vulnerabilities – AWS please clarify
Potentially disastrous Rowhammer bitflips can bypass ECC protections HN
Keywords: Spectre, Specter, Meltdown, Rowhammer