Skip to main content

CVE-2024-50095: Vulnerability in Linux Linux

High
VulnerabilityCVE-2024-50095cvecve-2024-50095
Published: Tue Nov 05 2024 (11/05/2024, 17:04:58 UTC)
Source: CVE
Vendor/Project: Linux
Product: Linux

Description

In the Linux kernel, the following vulnerability has been resolved: RDMA/mad: Improve handling of timed out WRs of mad agent Current timeout handler of mad agent acquires/releases mad_agent_priv lock for every timed out WRs. This causes heavy locking contention when higher no. of WRs are to be handled inside timeout handler. This leads to softlockup with below trace in some use cases where rdma-cm path is used to establish connection between peer nodes Trace: ----- BUG: soft lockup - CPU#4 stuck for 26s! [kworker/u128:3:19767] CPU: 4 PID: 19767 Comm: kworker/u128:3 Kdump: loaded Tainted: G OE ------- --- 5.14.0-427.13.1.el9_4.x86_64 #1 Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.4.8 11/26/2019 Workqueue: ib_mad1 timeout_sends [ib_core] RIP: 0010:__do_softirq+0x78/0x2ac RSP: 0018:ffffb253449e4f98 EFLAGS: 00000246 RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 000000000000001f RDX: 000000000000001d RSI: 000000003d1879ab RDI: fff363b66fd3a86b RBP: ffffb253604cbcd8 R08: 0000009065635f3b R09: 0000000000000000 R10: 0000000000000040 R11: ffffb253449e4ff8 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff8caa1fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd9ec9db900 CR3: 0000000891934006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __irq_exit_rcu+0xa1/0xc0 ? watchdog_timer_fn+0x1b2/0x210 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x127/0x2c0 ? hrtimer_interrupt+0xfc/0x210 ? __sysvec_apic_timer_interrupt+0x5c/0x110 ? sysvec_apic_timer_interrupt+0x37/0x90 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? __do_softirq+0x78/0x2ac ? __do_softirq+0x60/0x2ac __irq_exit_rcu+0xa1/0xc0 sysvec_call_function_single+0x72/0x90 </IRQ> <TASK> asm_sysvec_call_function_single+0x16/0x20 RIP: 0010:_raw_spin_unlock_irq+0x14/0x30 RSP: 0018:ffffb253604cbd88 EFLAGS: 00000247 RAX: 000000000001960d RBX: 0000000000000002 RCX: ffff8cad2a064800 RDX: 000000008020001b RSI: 0000000000000001 RDI: ffff8cad5d39f66c RBP: ffff8cad5d39f600 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8caa443e0c00 R11: ffffb253604cbcd8 R12: ffff8cacb8682538 R13: 0000000000000005 R14: ffffb253604cbd90 R15: ffff8cad5d39f66c cm_process_send_error+0x122/0x1d0 [ib_cm] timeout_sends+0x1dd/0x270 [ib_core] process_one_work+0x1e2/0x3b0 ? __pfx_worker_thread+0x10/0x10 worker_thread+0x50/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> Simplified timeout handler by creating local list of timed out WRs and invoke send handler post creating the list. The new method acquires/ releases lock once to fetch the list and hence helps to reduce locking contetiong when processing higher no. of WRs

AI-Powered Analysis

AILast updated: 06/28/2025, 17:10:56 UTC

Technical Analysis

CVE-2024-50095 is a vulnerability identified in the Linux kernel's RDMA (Remote Direct Memory Access) subsystem, specifically within the MAD (Management Datagram) agent's timeout handling mechanism. The MAD agent is responsible for managing communication in RDMA networks, which are commonly used in high-performance computing and data center environments to facilitate low-latency, high-throughput data transfers. The vulnerability arises from the way the timeout handler processes timed-out Work Requests (WRs). In the affected versions, the timeout handler acquires and releases the mad_agent_priv lock for every timed-out WR individually. This design leads to significant locking contention when a large number of WRs time out simultaneously, causing the kernel to experience a soft lockup. A soft lockup is a condition where a CPU core is stuck in a loop for an extended period (in this case, 26 seconds as per the trace), preventing it from performing other tasks and potentially leading to system unresponsiveness or degraded performance. The root cause is inefficient lock management in the timeout handler, which was resolved by simplifying the handler to create a local list of timed-out WRs and then invoking the send handler after acquiring and releasing the lock only once. This change reduces lock contention and prevents the CPU from being stuck in the timeout processing loop. The vulnerability is particularly relevant in scenarios where the rdma-cm path is used to establish connections between peer nodes, which is common in clustered or distributed systems relying on RDMA for communication. No known exploits are currently reported in the wild, and the vulnerability does not have an assigned CVSS score. The affected Linux kernel versions are identified by specific commit hashes, indicating that the issue is present in certain recent kernel builds prior to the fix. The vulnerability is technical and specific to environments utilizing RDMA, which are typically found in enterprise data centers, cloud infrastructure, and HPC clusters.

Potential Impact

For European organizations, especially those operating data centers, cloud services, or high-performance computing clusters, this vulnerability can lead to significant operational disruptions. The soft lockup condition can cause CPU cores to become unresponsive, leading to degraded system performance, potential downtime, and interruption of critical services relying on RDMA communication. This is particularly impactful for industries such as finance, telecommunications, research institutions, and cloud providers where RDMA is used to optimize network performance and reduce latency. The vulnerability does not directly expose confidentiality or integrity risks but affects availability and reliability of systems. In environments with high volumes of RDMA traffic, the risk of encountering this issue increases, potentially causing cascading failures in distributed applications. Additionally, recovery from soft lockups may require system reboots or manual intervention, increasing operational costs and downtime.

Mitigation Recommendations

European organizations should prioritize updating their Linux kernels to versions where this vulnerability is patched. Since the fix involves changes to the MAD agent's timeout handler, applying the latest stable kernel releases from trusted sources or vendor-provided updates is critical. For environments where immediate patching is not feasible, administrators should monitor RDMA-related system logs and kernel messages for signs of soft lockups or excessive locking contention. Reducing the load on RDMA communication paths or temporarily disabling RDMA features where possible can mitigate exposure. Organizations should also implement robust monitoring and alerting for kernel soft lockups and CPU stalls to enable rapid detection and response. Testing patches in staging environments before deployment is recommended to ensure compatibility with existing RDMA workloads. Finally, collaborating with hardware vendors (e.g., Dell PowerEdge servers mentioned in the trace) to ensure firmware and drivers are up to date can help maintain overall system stability.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
Linux
Date Reserved
2024-10-21T19:36:19.944Z
Cisa Enriched
false
Cvss Version
null
State
PUBLISHED

Threat ID: 682d9825c4522896dcbdff14

Added to database: 5/21/2025, 9:08:53 AM

Last enriched: 6/28/2025, 5:10:56 PM

Last updated: 8/17/2025, 12:52:35 PM

Views: 15

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats