Skip to main content

CVE-2025-21732: Vulnerability in Linux Linux

Medium
VulnerabilityCVE-2025-21732cvecve-2025-21732
Published: Thu Feb 27 2025 (02/27/2025, 02:12:10 UTC)
Source: CVE
Vendor/Project: Linux
Product: Linux

Description

In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey. This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1]. To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. [1] From dmesg: infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: <TASK> mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e

AI-Powered Analysis

AILast updated: 06/30/2025, 08:39:45 UTC

Technical Analysis

CVE-2025-21732 is a race condition vulnerability identified in the Linux kernel's RDMA (Remote Direct Memory Access) subsystem, specifically within the Mellanox mlx5 InfiniBand driver (mlx5_ib). The flaw arises in the handling of On-Demand Paging Memory Regions (ODP MR) related to the deregistration and invalidation of memory keys (lkeys) used for RDMA operations. During the deregistration flow (__mlx5_ib_dereg_mr), the kernel frees the lkey from the hardware perspective by calling mlx5_revoke_mr and related functions. However, concurrently, another task may attempt to invalidate a range for the same lkey via mlx5_ib_invalidate_range. This concurrent access leads to a race condition where the second task acquires a mutex lock (umem_odp->umem_mutex) and attempts to update the translation tables (mlx5r_umr_update_xlt) for a now-freed lkey. This results in a Completion Queue Entry (CQE) error on the User Memory Region (UMR) Queue Pair (QP), causing the QP to enter an error state and potentially disrupting RDMA communications. The error manifests as a memory bind operation error and is logged in kernel messages, indicating a failure in memory management for RDMA operations. The patch to fix this vulnerability involves extending the scope of the umem_mutex lock to cover the revoke operation, ensuring that no invalidation attempts occur on a freed lkey by nullifying the pointer to the MR after successful revocation. This prevents the race condition and stabilizes the RDMA memory management. This vulnerability affects Linux kernel versions containing the mlx5_ib driver with ODP MR support and is relevant to systems using Mellanox hardware for high-performance networking and RDMA workloads.

Potential Impact

For European organizations, especially those operating in high-performance computing, data centers, cloud infrastructure, and financial services relying on RDMA-enabled Linux servers with Mellanox hardware, this vulnerability can cause service disruptions due to RDMA Queue Pair errors. The race condition can lead to unexpected CQE errors, causing RDMA connections to fail or degrade performance, impacting applications that depend on low-latency, high-throughput networking such as distributed databases, virtualization platforms, and HPC clusters. While this vulnerability does not directly lead to privilege escalation or remote code execution, the resulting instability can cause denial of service conditions or data transfer interruptions. Organizations with critical infrastructure using RDMA for storage or inter-node communication may experience degraded reliability and increased operational risk. The absence of known exploits in the wild reduces immediate threat, but the complexity of the issue means that unpatched systems remain vulnerable to potential future exploitation or accidental triggering of the race condition, leading to operational outages.

Mitigation Recommendations

European organizations should prioritize updating their Linux kernel to versions that include the patch for CVE-2025-21732. Specifically, kernel versions released after the fix should be deployed on all RDMA-enabled servers using Mellanox mlx5 hardware. System administrators should audit their environments to identify systems utilizing ODP MR and mlx5_ib drivers. For environments where immediate patching is not feasible, consider temporarily disabling ODP MR support or restricting RDMA usage to minimize exposure. Additionally, monitoring kernel logs for CQE errors related to mlx5_ib can help detect attempts to trigger this race condition. Implementing strict concurrency controls and workload scheduling to reduce simultaneous deregistration and invalidation operations may also mitigate the risk. Collaboration with hardware vendors for firmware updates and configuration best practices is recommended to ensure comprehensive protection. Finally, integrating this vulnerability into vulnerability management and patching cycles will help maintain long-term security posture.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
Linux
Date Reserved
2024-12-29T08:45:45.756Z
Cisa Enriched
false
Cvss Version
null
State
PUBLISHED

Threat ID: 682d9832c4522896dcbe860c

Added to database: 5/21/2025, 9:09:06 AM

Last enriched: 6/30/2025, 8:39:45 AM

Last updated: 8/20/2025, 9:12:54 PM

Views: 28

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats