Being famous is not always a good thing …
There’s at least 5 problems related to the on-going Meltdown and Spectre serious CPU security bugs (AWS announcement) that impact the Database Administrator (DBA):
- in shared environments, like AWS or VMs, neighbour VMs can read/write your data on unpatched systems. A privacy solution is to provision the entire server to yourself.
- forthcoming patches might work, or not. Complex security patches often don’t address the issue initially, so there will be a sequence of related patches (whack-a-mole, like Shellshock) that will affect database uptime and cache performance. AWS has revised the related announcement page more than 12 times in 2018. Say good-bye to your 400-day uptimes!
- the patches are reported to consume more memory and reduce benchmark performance by 33% on Linux 4.2.0 on Intel processors. If your database server is configured, like with MySQL’s
innodb_buffer_pool_size, to use 90% of RAM you should consider 80% or 75% to avoid OOMs.
- in AWS, significant clock skew has been reported, so add that to your monitoring.
innodb_buffer_pool_size can be set dynamically in MySQL 5.7 with some caveats:
SET GLOBAL innodb_buffer_pool_size=4G;
The above applies doubly to server consolidation and microservices in VMs.
Of course, if you’re an experienced production DBA, then you never trusted VMs anyway. 🙂
> Measureable: 8-12% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-12%. Examples include Oracle OLTP (tpm), MariaBD (sysbench), Postgres(pgbench), netperf (< 256 byte), fio (random IO to NvME).
>Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measureable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005 w/ucode and SQLserver, and MongoDB.
I’ll leave it to others to pontificate on what it means when you can’t trust any desktop, server or mobile computer in an Internet-connected world. Or what HIPAA compliance means in the cloud where your server is a party-line telephone.
forums.aws.amazon.com: Degraded performance after forced reboot due to AWS instance maintenance , HN
ARM: Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
Escaping Docker container using waitid() – CVE-2017-5123
theregister.co.uk: Azure VMs borked following Meltdown patch, er, meltdown
CPU hardware vulnerable to side-channel attacks (Replace CPU hardware), HN (I called this in advance, but there needs to be two steps: re-design CPUs in 2018 if there’s no possible microcode update, then replace them in 2019)
blog.appoptics.com: Visualizing Meltdown on AWS
Intel alerted computer makers to chip flaws on Nov 29 – new claim – Total coincidence: That’s the same day Chipzilla’s CEO sold off his shares
zdnet.com: Researchers discover seven new Meltdown and Spectre attacks HN discussion
phoronix.com: Bisected: The Unfortunate Reason Linux 4.20 Is Running Slower HN
aws.amazon.com: Processor Speculative Execution Research Disclosure
forums.aws.amazon.com: Spectre/Meltdown Vulnerabilities – AWS please clarify
Potentially disastrous Rowhammer bitflips can bypass ECC protections HN
Google Says Spectre And Meltdown Are Too Difficult To Fix
Intel VISA Exploit Gives Access to Computer’s Entire Data, Researchers Show
Keywords: Spectre, Specter, Meltdown, Rowhammer