Skip to main content

CVE-2025-39989: Vulnerability in Linux Linux

High
VulnerabilityCVE-2025-39989cvecve-2025-39989
Published: Fri Apr 18 2025 (04/18/2025, 07:01:39 UTC)
Source: CVE
Vendor/Project: Linux
Product: Linux

Description

In the Linux kernel, the following vulnerability has been resolved: x86/mce: use is_copy_from_user() to determine copy-from-user context Patch series "mm/hwpoison: Fix regressions in memory failure handling", v4. ## 1. What am I trying to do: This patchset resolves two critical regressions related to memory failure handling that have appeared in the upstream kernel since version 5.17, as compared to 5.10 LTS. - copyin case: poison found in user page while kernel copying from user space - instr case: poison found while instruction fetching in user space ## 2. What is the expected outcome and why - For copyin case: Kernel can recover from poison found where kernel is doing get_user() or copy_from_user() if those places get an error return and the kernel return -EFAULT to the process instead of crashing. More specifily, MCE handler checks the fixup handler type to decide whether an in kernel #MC can be recovered. When EX_TYPE_UACCESS is found, the PC jumps to recovery code specified in _ASM_EXTABLE_FAULT() and return a -EFAULT to user space. - For instr case: If a poison found while instruction fetching in user space, full recovery is possible. User process takes #PF, Linux allocates a new page and fills by reading from storage. ## 3. What actually happens and why - For copyin case: kernel panic since v5.17 Commit 4c132d1d844a ("x86/futex: Remove .fixup usage") introduced a new extable fixup type, EX_TYPE_EFAULT_REG, and later patches updated the extable fixup type for copy-from-user operations, changing it from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG. It breaks previous EX_TYPE_UACCESS handling when posion found in get_user() or copy_from_user(). - For instr case: user process is killed by a SIGBUS signal due to #CMCI and #MCE race When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. ### Background: why *UN*corrected errors tied to *C*MCI in Intel platform [1] Prior to Icelake memory controllers reported patrol scrub events that detected a previously unseen uncorrected error in memory by signaling a broadcast machine check with an SRAO (Software Recoverable Action Optional) signature in the machine check bank. This was overkill because it's not an urgent problem that no core is on the verge of consuming that bad data. It's also found that multi SRAO UCE may cause nested MCE interrupts and finally become an IERR. Hence, Intel downgrades the machine check bank signature of patrol scrub from SRAO to UCNA (Uncorrected, No Action required), and signal changed to #CMCI. Just to add to the confusion, Linux does take an action (in uc_decode_notifier()) to try to offline the page despite the UC*NA* signature name. ### Background: why #CMCI and #MCE race when poison is consuming in Intel platform [1] Having decided that CMCI/UCNA is the best action for patrol scrub errors, the memory controller uses it for reads too. But the memory controller is executing asynchronously from the core, and can't tell the difference between a "real" read and a speculative read. So it will do CMCI/UCNA if an error is found in any read. Thus: 1) Core is clever and thinks address A is needed soon, issues a speculative read. 2) Core finds it is going to use address A soon after sending the read request 3) The CMCI from the memory controller is in a race with MCE from the core that will soon try to retire the load from address A. Quite often (because speculation has got better) the CMCI from the memory controller is delivered before the core is committed to the instruction reading address A, so the interrupt is taken, and Linux offlines the page (marking it as poison). ## Why user process is killed for instr case Commit 046545a661af ("mm/hwpoison: fix error page recovered but reported "not ---truncated---

AI-Powered Analysis

AILast updated: 07/03/2025, 19:41:22 UTC

Technical Analysis

CVE-2025-39989 is a vulnerability in the Linux kernel affecting memory failure handling mechanisms, specifically related to the x86 architecture's Machine Check Exception (MCE) subsystem and memory poisoning recovery. The vulnerability arises from regressions introduced since kernel version 5.17 compared to the 5.10 LTS baseline. It involves two main cases: the 'copyin' case, where the kernel copies data from user space using functions like get_user() or copy_from_user(), and the 'instr' case, where poisoned memory is encountered during instruction fetching in user space. In the 'copyin' case, the kernel is supposed to detect poisoned memory during copy-from-user operations and recover gracefully by returning an -EFAULT error to the user process instead of crashing. However, a commit (4c132d1d844a53fc4e4b5c34e36ef10d6124b783) changed the exception table fixup type from EX_TYPE_UACCESS to EX_TYPE_EFAULT_REG, breaking the existing recovery mechanism. This results in kernel panics when poisoned memory is encountered during these operations, leading to system crashes. In the 'instr' case, when poisoned memory is accessed during instruction fetch, the user process is killed by a SIGBUS signal due to a race condition between the Core Machine Check Interrupt (#CMCI) and Machine Check Exception (#MCE). This race arises because Intel memory controllers asynchronously report uncorrected memory errors (UCNA) via #CMCI, which can occur before the core commits the instruction fetch, causing Linux to offline the poisoned page. The race condition can cause the user process to be terminated unexpectedly. The vulnerability is rooted in complex interactions between hardware error reporting mechanisms on Intel platforms and Linux kernel error handling code. The issue affects kernel versions including and after commit 4c132d1d844a53fc4e4b5c34e36ef10d6124b783 and 88eded8104d2ca0429703755dd250f8cbecc1447. No known exploits are reported in the wild yet, and no CVSS score has been assigned. This vulnerability can cause system instability (kernel panics) and unexpected termination of user processes, impacting reliability and availability of Linux systems running on affected kernels, especially on Intel hardware with memory error reporting features.

Potential Impact

For European organizations, this vulnerability poses significant risks to the stability and reliability of Linux-based infrastructure, which is widely used across sectors such as finance, telecommunications, government, and critical infrastructure. Kernel panics caused by poisoned memory during copy-from-user operations can lead to unexpected system downtime, affecting availability of services and potentially causing data loss if systems are not properly backed up or if critical processes are interrupted. The user process termination due to instruction fetch poisoning can disrupt applications, potentially causing service interruptions or degraded performance. Organizations relying on Intel-based servers and workstations running affected Linux kernel versions are particularly vulnerable. This includes cloud service providers, data centers, and enterprises with on-premises Linux deployments. Given the complexity of the hardware-software interaction, the vulnerability may also complicate incident response and troubleshooting, increasing operational costs. The lack of a patch at the time of disclosure means organizations must implement interim mitigations to maintain system stability. The vulnerability could also indirectly impact confidentiality and integrity if system crashes lead to improper handling of sensitive data or incomplete transactions.

Mitigation Recommendations

1. Immediate kernel upgrade: Organizations should prioritize upgrading to Linux kernel versions where this vulnerability is fixed. Monitoring Linux kernel mailing lists and vendor advisories for patches addressing this issue is critical. 2. Hardware firmware updates: Coordinate with hardware vendors, especially Intel, to ensure that firmware and microcode updates addressing memory error reporting and MCE handling are applied. 3. Memory error monitoring: Implement enhanced monitoring of hardware memory errors using tools like mcelog or equivalent to detect early signs of memory poisoning and proactively replace faulty memory modules. 4. System hardening: Disable speculative execution features if possible and practical, as speculative reads contribute to the race condition causing user process termination. 5. Application-level resilience: Design critical applications to handle unexpected process termination gracefully, including implementing robust checkpointing and recovery mechanisms. 6. Segregation and redundancy: Deploy critical workloads on redundant systems or clusters to minimize impact of kernel panics and process kills. 7. Testing and validation: Before deploying kernel updates, thoroughly test in staging environments to ensure compatibility and stability, especially for systems with custom kernel modules or specialized hardware. 8. Incident response readiness: Prepare incident response plans to quickly address kernel panics and process terminations, including automated system reboots and alerting mechanisms.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
Linux
Date Reserved
2025-04-16T07:20:57.150Z
Cisa Enriched
false
Cvss Version
null
State
PUBLISHED

Threat ID: 682d9820c4522896dcbdd490

Added to database: 5/21/2025, 9:08:48 AM

Last enriched: 7/3/2025, 7:41:22 PM

Last updated: 8/15/2025, 2:22:28 AM

Views: 17

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats