CVE-2025-21732: Vulnerability in Linux Linux
In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey. This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1]. To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. [1] From dmesg: infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: <TASK> mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e
AI Analysis
Technical Summary
CVE-2025-21732 is a race condition vulnerability identified in the Linux kernel's RDMA (Remote Direct Memory Access) subsystem, specifically within the Mellanox mlx5 InfiniBand driver (mlx5_ib). The flaw arises in the handling of On-Demand Paging Memory Regions (ODP MR) related to the deregistration and invalidation of memory keys (lkeys) used for RDMA operations. During the deregistration flow (__mlx5_ib_dereg_mr), the kernel frees the lkey from the hardware perspective by calling mlx5_revoke_mr and related functions. However, concurrently, another task may attempt to invalidate a range for the same lkey via mlx5_ib_invalidate_range. This concurrent access leads to a race condition where the second task acquires a mutex lock (umem_odp->umem_mutex) and attempts to update the translation tables (mlx5r_umr_update_xlt) for a now-freed lkey. This results in a Completion Queue Entry (CQE) error on the User Memory Region (UMR) Queue Pair (QP), causing the QP to enter an error state and potentially disrupting RDMA communications. The error manifests as a memory bind operation error and is logged in kernel messages, indicating a failure in memory management for RDMA operations. The patch to fix this vulnerability involves extending the scope of the umem_mutex lock to cover the revoke operation, ensuring that no invalidation attempts occur on a freed lkey by nullifying the pointer to the MR after successful revocation. This prevents the race condition and stabilizes the RDMA memory management. This vulnerability affects Linux kernel versions containing the mlx5_ib driver with ODP MR support and is relevant to systems using Mellanox hardware for high-performance networking and RDMA workloads.
Potential Impact
For European organizations, especially those operating in high-performance computing, data centers, cloud infrastructure, and financial services relying on RDMA-enabled Linux servers with Mellanox hardware, this vulnerability can cause service disruptions due to RDMA Queue Pair errors. The race condition can lead to unexpected CQE errors, causing RDMA connections to fail or degrade performance, impacting applications that depend on low-latency, high-throughput networking such as distributed databases, virtualization platforms, and HPC clusters. While this vulnerability does not directly lead to privilege escalation or remote code execution, the resulting instability can cause denial of service conditions or data transfer interruptions. Organizations with critical infrastructure using RDMA for storage or inter-node communication may experience degraded reliability and increased operational risk. The absence of known exploits in the wild reduces immediate threat, but the complexity of the issue means that unpatched systems remain vulnerable to potential future exploitation or accidental triggering of the race condition, leading to operational outages.
Mitigation Recommendations
European organizations should prioritize updating their Linux kernel to versions that include the patch for CVE-2025-21732. Specifically, kernel versions released after the fix should be deployed on all RDMA-enabled servers using Mellanox mlx5 hardware. System administrators should audit their environments to identify systems utilizing ODP MR and mlx5_ib drivers. For environments where immediate patching is not feasible, consider temporarily disabling ODP MR support or restricting RDMA usage to minimize exposure. Additionally, monitoring kernel logs for CQE errors related to mlx5_ib can help detect attempts to trigger this race condition. Implementing strict concurrency controls and workload scheduling to reduce simultaneous deregistration and invalidation operations may also mitigate the risk. Collaboration with hardware vendors for firmware updates and configuration best practices is recommended to ensure comprehensive protection. Finally, integrating this vulnerability into vulnerability management and patching cycles will help maintain long-term security posture.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Switzerland, Italy
CVE-2025-21732: Vulnerability in Linux Linux
Description
In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP. During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs: mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait() At this point, the lkey is freed from the hardware's perspective. However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey. This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1]. To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey. [1] From dmesg: infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2 WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: <TASK> mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e
AI-Powered Analysis
Technical Analysis
CVE-2025-21732 is a race condition vulnerability identified in the Linux kernel's RDMA (Remote Direct Memory Access) subsystem, specifically within the Mellanox mlx5 InfiniBand driver (mlx5_ib). The flaw arises in the handling of On-Demand Paging Memory Regions (ODP MR) related to the deregistration and invalidation of memory keys (lkeys) used for RDMA operations. During the deregistration flow (__mlx5_ib_dereg_mr), the kernel frees the lkey from the hardware perspective by calling mlx5_revoke_mr and related functions. However, concurrently, another task may attempt to invalidate a range for the same lkey via mlx5_ib_invalidate_range. This concurrent access leads to a race condition where the second task acquires a mutex lock (umem_odp->umem_mutex) and attempts to update the translation tables (mlx5r_umr_update_xlt) for a now-freed lkey. This results in a Completion Queue Entry (CQE) error on the User Memory Region (UMR) Queue Pair (QP), causing the QP to enter an error state and potentially disrupting RDMA communications. The error manifests as a memory bind operation error and is logged in kernel messages, indicating a failure in memory management for RDMA operations. The patch to fix this vulnerability involves extending the scope of the umem_mutex lock to cover the revoke operation, ensuring that no invalidation attempts occur on a freed lkey by nullifying the pointer to the MR after successful revocation. This prevents the race condition and stabilizes the RDMA memory management. This vulnerability affects Linux kernel versions containing the mlx5_ib driver with ODP MR support and is relevant to systems using Mellanox hardware for high-performance networking and RDMA workloads.
Potential Impact
For European organizations, especially those operating in high-performance computing, data centers, cloud infrastructure, and financial services relying on RDMA-enabled Linux servers with Mellanox hardware, this vulnerability can cause service disruptions due to RDMA Queue Pair errors. The race condition can lead to unexpected CQE errors, causing RDMA connections to fail or degrade performance, impacting applications that depend on low-latency, high-throughput networking such as distributed databases, virtualization platforms, and HPC clusters. While this vulnerability does not directly lead to privilege escalation or remote code execution, the resulting instability can cause denial of service conditions or data transfer interruptions. Organizations with critical infrastructure using RDMA for storage or inter-node communication may experience degraded reliability and increased operational risk. The absence of known exploits in the wild reduces immediate threat, but the complexity of the issue means that unpatched systems remain vulnerable to potential future exploitation or accidental triggering of the race condition, leading to operational outages.
Mitigation Recommendations
European organizations should prioritize updating their Linux kernel to versions that include the patch for CVE-2025-21732. Specifically, kernel versions released after the fix should be deployed on all RDMA-enabled servers using Mellanox mlx5 hardware. System administrators should audit their environments to identify systems utilizing ODP MR and mlx5_ib drivers. For environments where immediate patching is not feasible, consider temporarily disabling ODP MR support or restricting RDMA usage to minimize exposure. Additionally, monitoring kernel logs for CQE errors related to mlx5_ib can help detect attempts to trigger this race condition. Implementing strict concurrency controls and workload scheduling to reduce simultaneous deregistration and invalidation operations may also mitigate the risk. Collaboration with hardware vendors for firmware updates and configuration best practices is recommended to ensure comprehensive protection. Finally, integrating this vulnerability into vulnerability management and patching cycles will help maintain long-term security posture.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- Linux
- Date Reserved
- 2024-12-29T08:45:45.756Z
- Cisa Enriched
- false
- Cvss Version
- null
- State
- PUBLISHED
Threat ID: 682d9832c4522896dcbe860c
Added to database: 5/21/2025, 9:09:06 AM
Last enriched: 6/30/2025, 8:39:45 AM
Last updated: 8/20/2025, 9:12:54 PM
Views: 28
Related Threats
CVE-2025-43758: CWE-552 Files or Directories Accessible to External Parties in Liferay Portal
MediumCVE-2025-52287: n/a
HighCVE-2025-55581: n/a
HighCVE-2025-52085: n/a
HighCVE-2025-43760: CWE-79: Cross-site Scripting in Liferay Portal
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.