Skip to main content

CVE-2025-21892: Vulnerability in Linux Linux

High
VulnerabilityCVE-2025-21892cvecve-2025-21892
Published: Thu Mar 27 2025 (03/27/2025, 14:57:17 UTC)
Source: CVE
Vendor/Project: Linux
Product: Linux

Description

In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix the recovery flow of the UMR QP This patch addresses an issue in the recovery flow of the UMR QP, ensuring tasks do not get stuck, as highlighted by the call trace [1]. During recovery, before transitioning the QP to the RESET state, the software must wait for all outstanding WRs to complete. Failing to do so can cause the firmware to skip sending some flushed CQEs with errors and simply discard them upon the RESET, as per the IB specification. This race condition can result in lost CQEs and tasks becoming stuck. To resolve this, the patch sends a final WR which serves only as a barrier before moving the QP state to RESET. Once a CQE is received for that final WR, it guarantees that no outstanding WRs remain, making it safe to transition the QP to RESET and subsequently back to RTS, restoring proper functionality. Note: For the barrier WR, we simply reuse the failed and ready WR. Since the QP is in an error state, it will only receive IB_WC_WR_FLUSH_ERR. However, as it serves only as a barrier we don't care about its status. [1] INFO: task rdma_resource_l:1922 blocked for more than 120 seconds. Tainted: G W 6.12.0-rc7+ #1626 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:rdma_resource_l state:D stack:0 pid:1922 tgid:1922 ppid:1369 flags:0x00004004 Call Trace: <TASK> __schedule+0x420/0xd30 schedule+0x47/0x130 schedule_timeout+0x280/0x300 ? mark_held_locks+0x48/0x80 ? lockdep_hardirqs_on_prepare+0xe5/0x1a0 wait_for_completion+0x75/0x130 mlx5r_umr_post_send_wait+0x3c2/0x5b0 [mlx5_ib] ? __pfx_mlx5r_umr_done+0x10/0x10 [mlx5_ib] mlx5r_umr_revoke_mr+0x93/0xc0 [mlx5_ib] __mlx5_ib_dereg_mr+0x299/0x520 [mlx5_ib] ? _raw_spin_unlock_irq+0x24/0x40 ? wait_for_completion+0xfe/0x130 ? rdma_restrack_put+0x63/0xe0 [ib_core] ib_dereg_mr_user+0x5f/0x120 [ib_core] ? lock_release+0xc6/0x280 destroy_hw_idr_uobject+0x1d/0x60 [ib_uverbs] uverbs_destroy_uobject+0x58/0x1d0 [ib_uverbs] uobj_destroy+0x3f/0x70 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xbb0 [ib_uverbs] ? __pfx_uverbs_destroy_def_handler+0x10/0x10 [ib_uverbs] ? __lock_acquire+0x64e/0x2080 ? mark_held_locks+0x48/0x80 ? find_held_lock+0x2d/0xa0 ? lock_acquire+0xc1/0x2f0 ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] ? __fget_files+0xc3/0x1b0 ib_uverbs_ioctl+0xe7/0x170 [ib_uverbs] ? ib_uverbs_ioctl+0xcb/0x170 [ib_uverbs] __x64_sys_ioctl+0x1b0/0xa70 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f99c918b17b RSP: 002b:00007ffc766d0468 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc766d0578 RCX: 00007f99c918b17b RDX: 00007ffc766d0560 RSI: 00000000c0181b01 RDI: 0000000000000003 RBP: 00007ffc766d0540 R08: 00007f99c8f99010 R09: 000000000000bd7e R10: 00007f99c94c1c70 R11: 0000000000000246 R12: 00007ffc766d0530 R13: 000000000000001c R14: 0000000040246a80 R15: 0000000000000000 </TASK>

AI-Powered Analysis

AILast updated: 06/27/2025, 23:42:22 UTC

Technical Analysis

CVE-2025-21892 is a vulnerability in the Linux kernel's RDMA (Remote Direct Memory Access) subsystem, specifically affecting the Mellanox mlx5 driver handling UMR (Unregistered Memory Region) Queue Pairs (QPs). The issue lies in the recovery flow of the UMR QP, where the software fails to properly wait for all outstanding Work Requests (WRs) to complete before transitioning the QP to the RESET state. According to the InfiniBand (IB) specification, when a QP is reset, the firmware should send flushed Completion Queue Entries (CQEs) with error statuses for any outstanding WRs. However, due to a race condition, some CQEs may be skipped and discarded silently upon RESET, causing tasks to become stuck indefinitely. This manifests as blocked tasks waiting on completion events that never arrive, leading to hung processes and potential denial of service conditions. The patch to fix this vulnerability introduces a barrier WR that is sent before the QP state transitions to RESET. This barrier WR ensures that all prior WRs have been completed by waiting for its corresponding CQE, which acts as a synchronization point. Only after receiving this CQE does the software proceed to reset the QP, guaranteeing no outstanding WRs remain and preventing tasks from hanging. The vulnerability affects specific Linux kernel versions identified by commit hashes, and it was publicly disclosed in March 2025. No known exploits are currently reported in the wild, and no CVSS score has been assigned. The issue primarily impacts systems using RDMA with Mellanox mlx5 hardware, which is common in high-performance computing, data centers, and enterprise environments relying on low-latency, high-throughput networking. The technical details include kernel call traces showing tasks blocked for over 120 seconds due to this flaw, highlighting the severity of the hang condition. This vulnerability does not directly lead to privilege escalation or data leakage but can cause significant availability degradation and operational disruption in affected systems.

Potential Impact

For European organizations, especially those operating data centers, cloud infrastructure, or HPC clusters utilizing RDMA-enabled Linux servers with Mellanox mlx5 hardware, this vulnerability poses a risk of service disruption. The hang condition caused by stuck tasks can lead to degraded performance, interrupted workflows, and potential downtime of critical applications relying on RDMA for fast data transfers. Industries such as finance, telecommunications, research institutions, and large enterprises with latency-sensitive workloads may experience operational impacts. Although the vulnerability does not appear to allow remote code execution or data compromise, the denial of service effect can affect business continuity and SLAs. Recovery from such hangs may require manual intervention or system reboots, increasing operational costs and complexity. Given the widespread use of Linux in European IT infrastructure and the adoption of RDMA in performance-critical environments, the impact is non-trivial. However, organizations not using RDMA or Mellanox mlx5 hardware are unlikely to be affected. The absence of known exploits reduces immediate risk, but the vulnerability should be addressed promptly to avoid potential future exploitation or accidental outages.

Mitigation Recommendations

European organizations should prioritize applying the official Linux kernel patches that address this vulnerability as soon as they become available. Since the fix involves changes to the mlx5 driver recovery flow, updating to a patched kernel version is essential. In environments where immediate patching is not feasible, administrators should monitor for symptoms such as hung tasks related to RDMA operations, specifically those involving mlx5 UMR QPs. Implementing enhanced monitoring and alerting on kernel hung task messages and RDMA subsystem logs can help detect early signs of the issue. Additionally, organizations should review their RDMA usage patterns and consider temporarily disabling or limiting RDMA workloads on affected hardware until patches are applied. Testing patches in staging environments before production deployment is recommended to ensure stability. For critical systems, coordinating with hardware vendors and Linux distribution maintainers to obtain timely updates and support is advisable. Network segmentation and strict access controls on RDMA-capable nodes can reduce exposure. Finally, documenting recovery procedures for hung RDMA tasks will help minimize downtime if the issue occurs before patching.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
Linux
Date Reserved
2024-12-29T08:45:45.783Z
Cisa Enriched
false
Cvss Version
null
State
PUBLISHED

Threat ID: 682d9820c4522896dcbdd37e

Added to database: 5/21/2025, 9:08:48 AM

Last enriched: 6/27/2025, 11:42:22 PM

Last updated: 8/12/2025, 5:51:07 AM

Views: 21

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats