There’s at least 4 problems related to the on-going Meltdown and Spectre serious CPU security bugs (AWS announcement) that impact the Database Administrator (DBA):
- in shared environments, like AWS or VMs, neighbour VMs can read your data on unpatched systems. A privacy solution is to provision the entire server to yourself.
- forthcoming patches might work, or might not work. Complex patches often don’t address the issue initially, so there could be a sequence of related patches (whack-a-mole, like Shellshock) that will affect database uptime and cache performance. Say good-bye to your 400-day uptimes!
- the patches are reported to consume more memory and reduce system performance. If your database server is configured, like with MySQL’s
innodb_buffer_pool_size
, to use 90% of RAM you should consider 80% or 75% to avoid OOMs. - in AWS, significant clock skew has been reported, so add that to your monitoring.
Note: innodb_buffer_pool_size
can be set dynamically in MySQL 5.7 with some caveats:
SET GLOBAL innodb_buffer_pool_size=4G;
The above applies doubly to server consolidation and microservices in VMs.
Of course, if you’re an experienced production DBA, then you never trusted VMs anyway.
Some numbers from Redhat (paywalled):
> Measureable: 8-12% – Highly cached random memory, with buffered I/O, OLTP database workloads, and benchmarks with high kernel-to-user space transitions are impacted between 8-12%. Examples include Oracle OLTP (tpm), MariaBD (sysbench), Postgres(pgbench), netperf (< 256 byte), fio (random IO to NvME).
>Modest: 3-7% – Database analytics, Decision Support System (DSS), and Java VMs are impacted less than the “Measureable” category. These applications may have significant sequential disk or network traffic, but kernel/device drivers are able to aggregate requests to moderate level of kernel-to-user transitions. Examples include SPECjbb2005 w/ucode and SQLserver, and MongoDB.
I’ll leave it to others to pontificate on what it means when you can’t trust any desktop, server or mobile computer in an Internet-connected world. Or what HIPAA compliance means in the cloud where your server is a party-line telephone.
forums.aws.amazon.com: Degraded performance after forced reboot due to AWS instance maintenance , HN
ARM: Vulnerability of Speculative Processors to Cache Timing Side-Channel Mechanism
Escaping Docker container using waitid() – CVE-2017-5123
CPU hardware vulnerable to side-channel attacks (Replace CPU hardware), HN (I called this in advance, but there needs to be two steps: re-design CPUs in 2018 if there’s no possible microcode update, then replace them in 2019)