Linux

Copy Fail: 732 bytes to root on almost every Linux server since 2017

A subtle 2017 optimisation in the Linux kernel's crypto layer has just been turned into a reliable, 732-byte path from any user account to root. Here's what it means for your servers and what to do right now.

Photo by Jr Korpa / Unsplash

Nine years of a quiet performance tweak in the kernel just collapsed into a deterministic, one-shot path from any unprivileged account to full root. No races. No leaks. No distribution-specific payloads. A 732-byte Python script does the job on Ubuntu 24.04, RHEL 10.1, Amazon Linux 2023, SUSE 16 and everything in between.

The vulnerability, tracked as CVE-2026-31431 and dubbed Copy Fail, was disclosed on 29 April 2026 by researchers at Theori using their Xint Code scanner. It lives in the algif_aead userspace crypto interface (AF_ALG). The root cause traces to commit 72548b093ee3 in 2017, which switched AEAD operations to in-place processing for speed. That change let page-cache pages from readable files end up in a writable scatterlist. Chain it with splice() and a carefully crafted AEAD request, and an attacker can overwrite four controlled bytes in the in-memory copy of any file the kernel has cached, including setuid binaries such as /usr/bin/su.

The exploit runs, poisons the cached binary, then executes it. The binary now behaves as if it were owned by root. The change never touches disk, so sha256sum, AIDE and most file-integrity tools see nothing until the page is evicted or the machine reboots. It is the opposite of the usual messy local privilege escalation. It is reliable, portable and leaves almost no forensic trace while the cache is hot.

What this means for users and self-hosters

If you run any Linux server that accepts local logins, SSH, a web app with a shell, a CI runner, a container host or a shared VPS, this is material. A compromised low-privilege account, a supply-chain hit on a dependency or even a malicious container process can now escalate in seconds. The page cache is shared between host and containers on the same kernel, so the blast radius crosses isolation boundaries more easily than most people expect.

For self-hosters the implications are blunt. The old assumption that local user equals contained risk is gone until you patch. Multi-tenant setups, development boxes with untrusted code or anything exposed to the public internet via a web service are now higher priority. Single-user laptops or air-gapped boxes are lower risk but still worth fixing. The script is trivial to run if an attacker ever gets a shell.

What to do right now on Linux servers

Step 1: apply the universal temporary mitigation (do this today)

echo 'install algif_aead /bin/false' | sudo tee /etc/modprobe.d/disable-algif_aead.conf
sudo rmmod algif_aead 2>/dev/null || true

This blacklists the vulnerable module. It has negligible impact on normal workloads. SSH, TLS termination, LUKS, IPsec and OpenSSL continue to work. The only things that break are rare userspace programs that explicitly call the af_alg engine. Check with ss -xa | grep alg or lsof if you are worried.

Wait, will blacklisting actually work for me?

Some distros like CloudLinux have the module built-in, so blacklisting won't work. You can check for example with the following script:

#!/bin/bash
# Check if algif_aead is built-in or loadable

MODULE="algif_aead"
BUILTIN_FILE="/lib/modules/$(uname -r)/modules.builtin"

if grep -q "${MODULE}.ko" "$BUILTIN_FILE" 2>/dev/null; then
    echo "🚨 WARNING: ${MODULE} is built directly into your kernel."
    echo "The modprobe blacklist WILL NOT WORK. You must update your kernel or use seccomp."
elif modinfo "$MODULE" >/dev/null 2>&1; then
    echo "✅ GOOD: ${MODULE} is a loadable module."
    echo "The modprobe blacklist mitigation will protect this system."
else
    echo "ℹ️ INFO: ${MODULE} not found at all."
    echo "Your kernel may not have AF_ALG compiled in, but verify with your distro."
fi

Step 2: update the kernel

Run your normal update command and reboot:

Debian or Ubuntu: sudo apt update && sudo apt full-upgrade
RHEL, AlmaLinux, Rocky or Fedora: sudo dnf update or yum update
SUSE: sudo zypper patch

Reboot clears any poisoned page cache. As of 1 May 2026 full kernel packages are landing unevenly. The module blacklist buys you time.

Step 3: container and orchestration hardening

On Kubernetes or Docker hosts, add a seccomp profile that blocks AF_ALG socket creation for untrusted workloads, or roll out the same module blacklist via a DaemonSet on the nodes. OVHcloud has already published a tested DaemonSet for their Managed Kubernetes Service.

Step 4: verify and monitor

After the blacklist, test that the exploit no longer works. The PoC is public on GitHub. Watch for unusual su or sudo invocations and consider runtime detection rules if you already run something like Sysdig or Falco.

What the distros and providers are actually doing

Ubuntu released a kmod update (USN-8226-1) that disables the module fleet-wide until kernel packages arrive. The fix commit a664bf3d603d reverts the 2017 in-place optimisation. (Edit: new page with fixes)

Red Hat rates the issue "Important (CVSS 7.8)". No errata yet for RHEL 8, 9 or 10, but they document the same module blacklist plus boot-parameter options. AlmaLinux and CloudLinux have moved faster. Patched kernels are in testing and KernelCare livepatches are rolling out automatically for supported versions.

SUSE calls it "Important" and has fixes in QA for most supported products, with a public blog post acknowledging the severity and the reliability of the exploit.

Amazon Linux, Debian, Arch and others have public trackers and are following the same pattern: module disable first, kernel update second.

Hosting providers have been pragmatic. OVHcloud is shipping patched MKS versions and gave customers an immediate DaemonSet. Several managed VPS and cPanel hosts (InterServer, BoxToPlay, myhost.nz and others) applied the blacklist across their fleets within hours and notified customers. Unmanaged VPS users are on their own, exactly as expected. If your provider has not contacted you, assume the worst and run the one-liner yourself.

The verdict

Copy Fail is not theoretical and it is not someone else's problem. It is a clean logic error that survived nine years because the page-cache assumptions in the crypto path were never stress-tested against an unprivileged attacker who could control both ends of a splice. The researchers turned an optimisation into a reliable primitive with one afternoon of AI-assisted scanning.

For anyone self-hosting Linux, the takeaway is simple. Treat every local account as potentially hostile until the kernel is updated. The mitigation costs nothing and the patch is straightforward. Apply both today. The window is open and the exploit is already in the wild. Watch your distro's security feed and consider livepatching options if reboots are painful. Next time an optimisation lands, perhaps we will scrutinise the page-cache assumptions a bit harder.