Skip to main content

CVE-2024-38557: Vulnerability in Linux Linux

High
VulnerabilityCVE-2024-38557cvecve-2024-38557
Published: Wed Jun 19 2024 (06/19/2024, 13:35:27 UTC)
Source: CVE
Vendor/Project: Linux
Product: Linux

Description

In the Linux kernel, the following vulnerability has been resolved: net/mlx5: Reload only IB representors upon lag disable/enable On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded. In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock: Call trace: __switch_to+0xf4/0x148 __schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68 mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core] mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core] mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core] mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core] mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core] mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core] mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138 [mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0xdc/0x180 genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220 netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268 netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0 el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above. While at it, refactor the mlx5_esw_offloads_rep_load() function to have a static helper method for its internal logic, in symmetry with the representor unload design.

AI-Powered Analysis

AILast updated: 06/29/2025, 11:11:57 UTC

Technical Analysis

CVE-2024-38557 is a vulnerability in the Linux kernel's mlx5 driver, which handles Mellanox ConnectX-5 and newer network adapters, specifically related to Link Aggregation Group (LAG) management and InfiniBand (IB) representors. The vulnerability arises during the enable/disable operations of LAG interfaces. When a LAG is disabled, the bond IB device and all its representors are destroyed, and the slave IB representors are reloaded. However, if loading a slave IB representor fails, the error handling flow incorrectly unloads all representors, including Ethernet representors, which are not managed by the LAG driver. This leads to detachment and removal of network devices from the LAG bond in an improper manner. The process involves acquiring and releasing locks (specifically the lag lock) multiple times, but the code attempts to reacquire the lag lock while it is already held, causing a deadlock. The deadlock occurs in the mlx5_core driver functions responsible for managing representors and LAG state changes, as evidenced by the detailed kernel call trace provided. The fix implemented restricts the reload and unload operations to only the IB representors of the slaves during LAG enable/disable, preventing the deadlock by avoiding unnecessary operations on Ethernet representors. Additionally, the mlx5_esw_offloads_rep_load() function was refactored to improve internal logic symmetry and maintainability. This vulnerability affects Linux kernel versions containing the specified commit hashes and impacts systems using mlx5-based network devices with LAG and IB representors enabled. Exploitation could cause kernel deadlocks leading to denial of service (DoS) conditions on affected systems. No known exploits are reported in the wild at this time.

Potential Impact

For European organizations, this vulnerability poses a risk primarily to data centers, cloud providers, and enterprises that deploy Linux servers with Mellanox mlx5 network adapters, especially those utilizing LAG for network redundancy and performance and InfiniBand for high-speed interconnects. The deadlock can cause kernel hangs or crashes, resulting in network outages, degraded performance, or complete loss of connectivity on critical infrastructure. This can disrupt business operations, impact service availability, and potentially lead to financial losses or reputational damage. Organizations relying on high-performance computing (HPC), scientific research, or financial trading platforms that use InfiniBand and mlx5 devices are particularly vulnerable. The vulnerability does not appear to allow privilege escalation or remote code execution but can be triggered locally or via network management operations that toggle LAG states. Given the prevalence of Linux in European enterprise and cloud environments, the impact could be significant if unpatched systems are present in critical network paths.

Mitigation Recommendations

European organizations should prioritize updating their Linux kernels to versions that include the patch for CVE-2024-38557 as soon as they become available from their Linux distribution vendors. In the interim, administrators should avoid toggling LAG interfaces on mlx5 devices unnecessarily and monitor system logs for errors related to mlx5 representors or LAG operations. Network teams should audit their infrastructure to identify systems using mlx5 adapters with LAG and InfiniBand representors and plan maintenance windows for kernel upgrades. Additionally, implementing robust monitoring and alerting for kernel deadlocks or network interface failures can help detect exploitation attempts or accidental triggers early. Where possible, testing kernel updates in staging environments that replicate production LAG and IB configurations is recommended to ensure stability. Vendors and integrators should also review their custom kernel modules or patches that interact with mlx5 drivers to avoid conflicting behaviors. Finally, maintaining good backup and recovery procedures will mitigate operational impacts in case of outages caused by this vulnerability.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
Linux
Date Reserved
2024-06-18T19:36:34.921Z
Cisa Enriched
true
Cvss Version
null
State
PUBLISHED

Threat ID: 682d9829c4522896dcbe296e

Added to database: 5/21/2025, 9:08:57 AM

Last enriched: 6/29/2025, 11:11:57 AM

Last updated: 7/29/2025, 10:20:53 AM

Views: 12

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats