CVE-2024-53169: Vulnerability in Linux Linux
In the Linux kernel, the following vulnerability has been resolved: nvme-fabrics: fix kernel crash while shutting down controller The nvme keep-alive operation, which executes at a periodic interval, could potentially sneak in while shutting down a fabric controller. This may lead to a race between the fabric controller admin queue destroy code path (invoked while shutting down controller) and hw/hctx queue dispatcher called from the nvme keep-alive async request queuing operation. This race could lead to the kernel crash shown below: Call Trace: autoremove_wake_function+0x0/0xbc (unreliable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 While shutting down fabric controller, if nvme keep-alive request sneaks in then it would be flushed off. The nvme_keep_alive_end_io function is then invoked to handle the end of the keep-alive operation which decrements the admin->q_usage_counter and assuming this is the last/only request in the admin queue then the admin->q_usage_counter becomes zero. If that happens then blk-mq destroy queue operation (blk_mq_destroy_ queue()) which could be potentially running simultaneously on another cpu (as this is the controller shutdown code path) would forward progress and deletes the admin queue. So, now from this point onward we are not supposed to access the admin queue resources. However the issue here's that the nvme keep-alive thread running hw/hctx queue dispatch operation hasn't yet finished its work and so it could still potentially access the admin queue resource while the admin queue had been already deleted and that causes the above crash. The above kernel crash is regression caused due to changes implemented in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin g the admin queue and freeing the admin tagset so that it wouldn't sneak in during the shutdown operation. However we removed the keep alive stop operation from the beginning of the controller shutdown code path in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") and added it under nvme_uninit_ctrl() which executes very late in the shutdown code path after the admin queue is destroyed and its tagset is removed. So this change created the possibility of keep-alive sneaking in and interfering with the shutdown operation and causing observed kernel crash. To fix the observed crash, we decided to move nvme_stop_keep_alive() from nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure that we don't forward progress and delete the admin queue until the keep- alive operation is finished (if it's in-flight) or cancelled and that would help contain the race condition explained above and hence avoid the crash. Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of adding nvme_stop_keep_alive() to the beginning of the controller shutdown code path in nvme_stop_ctrl(), as was the case earlier before commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"), would help save one callsite of nvme_stop_keep_alive().
AI Analysis
Technical Summary
CVE-2024-53169 is a vulnerability in the Linux kernel's NVMe fabrics subsystem that can cause a kernel crash during the shutdown of an NVMe fabric controller. The root cause is a race condition between the NVMe keep-alive operation and the shutdown sequence of the fabric controller's admin queue. Specifically, the keep-alive operation runs periodically and may sneak in while the controller is shutting down. During shutdown, the admin queue is destroyed and its resources freed. However, if the keep-alive operation is still in progress or starts just before the admin queue is destroyed, it may access the now-deleted admin queue resources. This leads to a use-after-free scenario causing a kernel crash. The vulnerability was introduced by a regression in a prior commit (a54a93d0e359) that moved the stopping of the keep-alive operation to a late stage in the shutdown process, after the admin queue was already destroyed. The fix involves moving the stop operation for the keep-alive to an earlier point in the shutdown sequence (specifically to nvme_remove_admin_tag_set()), ensuring that the keep-alive operation is fully stopped or completed before the admin queue is destroyed, thus preventing the race condition and subsequent crash. This vulnerability affects Linux kernel versions containing the specified commits and impacts systems using NVMe over fabrics, which is a networked storage protocol used in high-performance and enterprise environments. There is no indication of exploitation in the wild, and no CVSS score has been assigned yet.
Potential Impact
For European organizations, this vulnerability primarily impacts systems running Linux kernels with NVMe over fabrics support, particularly in data centers, cloud providers, and enterprises using NVMe storage networks for high-performance storage solutions. A kernel crash caused by this race condition can lead to system instability, unexpected reboots, or downtime, potentially disrupting critical services and applications. This can affect confidentiality, integrity, and availability indirectly by causing denial of service conditions. While the vulnerability does not appear to allow privilege escalation or remote code execution, the resulting kernel panic can cause loss of availability and data in-flight. Organizations relying on NVMe fabrics for storage networking in sectors such as finance, telecommunications, healthcare, and critical infrastructure could face operational disruptions. The lack of known exploits reduces immediate risk, but the vulnerability should be addressed promptly to maintain system stability and reliability.
Mitigation Recommendations
European organizations should apply the Linux kernel patches that address this race condition as soon as they are available from their Linux distribution vendors. Specifically, ensure that the kernel version includes the fix that moves the nvme_stop_keep_alive() call to nvme_remove_admin_tag_set(). Until patched, organizations can mitigate risk by minimizing shutdowns or reboots of NVMe fabric controllers and avoiding abrupt shutdowns that might trigger the race. Monitoring kernel logs for nvme-related errors or crashes can help detect attempts to exploit this race condition. Additionally, organizations should review their NVMe fabric controller shutdown procedures to ensure orderly and controlled shutdowns. For critical systems, consider implementing redundancy and failover mechanisms to reduce impact from potential crashes. Coordination with hardware and storage vendors to confirm compatibility with patched kernels is also recommended.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Switzerland, Italy
CVE-2024-53169: Vulnerability in Linux Linux
Description
In the Linux kernel, the following vulnerability has been resolved: nvme-fabrics: fix kernel crash while shutting down controller The nvme keep-alive operation, which executes at a periodic interval, could potentially sneak in while shutting down a fabric controller. This may lead to a race between the fabric controller admin queue destroy code path (invoked while shutting down controller) and hw/hctx queue dispatcher called from the nvme keep-alive async request queuing operation. This race could lead to the kernel crash shown below: Call Trace: autoremove_wake_function+0x0/0xbc (unreliable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 While shutting down fabric controller, if nvme keep-alive request sneaks in then it would be flushed off. The nvme_keep_alive_end_io function is then invoked to handle the end of the keep-alive operation which decrements the admin->q_usage_counter and assuming this is the last/only request in the admin queue then the admin->q_usage_counter becomes zero. If that happens then blk-mq destroy queue operation (blk_mq_destroy_ queue()) which could be potentially running simultaneously on another cpu (as this is the controller shutdown code path) would forward progress and deletes the admin queue. So, now from this point onward we are not supposed to access the admin queue resources. However the issue here's that the nvme keep-alive thread running hw/hctx queue dispatch operation hasn't yet finished its work and so it could still potentially access the admin queue resource while the admin queue had been already deleted and that causes the above crash. The above kernel crash is regression caused due to changes implemented in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin g the admin queue and freeing the admin tagset so that it wouldn't sneak in during the shutdown operation. However we removed the keep alive stop operation from the beginning of the controller shutdown code path in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") and added it under nvme_uninit_ctrl() which executes very late in the shutdown code path after the admin queue is destroyed and its tagset is removed. So this change created the possibility of keep-alive sneaking in and interfering with the shutdown operation and causing observed kernel crash. To fix the observed crash, we decided to move nvme_stop_keep_alive() from nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure that we don't forward progress and delete the admin queue until the keep- alive operation is finished (if it's in-flight) or cancelled and that would help contain the race condition explained above and hence avoid the crash. Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of adding nvme_stop_keep_alive() to the beginning of the controller shutdown code path in nvme_stop_ctrl(), as was the case earlier before commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"), would help save one callsite of nvme_stop_keep_alive().
AI-Powered Analysis
Technical Analysis
CVE-2024-53169 is a vulnerability in the Linux kernel's NVMe fabrics subsystem that can cause a kernel crash during the shutdown of an NVMe fabric controller. The root cause is a race condition between the NVMe keep-alive operation and the shutdown sequence of the fabric controller's admin queue. Specifically, the keep-alive operation runs periodically and may sneak in while the controller is shutting down. During shutdown, the admin queue is destroyed and its resources freed. However, if the keep-alive operation is still in progress or starts just before the admin queue is destroyed, it may access the now-deleted admin queue resources. This leads to a use-after-free scenario causing a kernel crash. The vulnerability was introduced by a regression in a prior commit (a54a93d0e359) that moved the stopping of the keep-alive operation to a late stage in the shutdown process, after the admin queue was already destroyed. The fix involves moving the stop operation for the keep-alive to an earlier point in the shutdown sequence (specifically to nvme_remove_admin_tag_set()), ensuring that the keep-alive operation is fully stopped or completed before the admin queue is destroyed, thus preventing the race condition and subsequent crash. This vulnerability affects Linux kernel versions containing the specified commits and impacts systems using NVMe over fabrics, which is a networked storage protocol used in high-performance and enterprise environments. There is no indication of exploitation in the wild, and no CVSS score has been assigned yet.
Potential Impact
For European organizations, this vulnerability primarily impacts systems running Linux kernels with NVMe over fabrics support, particularly in data centers, cloud providers, and enterprises using NVMe storage networks for high-performance storage solutions. A kernel crash caused by this race condition can lead to system instability, unexpected reboots, or downtime, potentially disrupting critical services and applications. This can affect confidentiality, integrity, and availability indirectly by causing denial of service conditions. While the vulnerability does not appear to allow privilege escalation or remote code execution, the resulting kernel panic can cause loss of availability and data in-flight. Organizations relying on NVMe fabrics for storage networking in sectors such as finance, telecommunications, healthcare, and critical infrastructure could face operational disruptions. The lack of known exploits reduces immediate risk, but the vulnerability should be addressed promptly to maintain system stability and reliability.
Mitigation Recommendations
European organizations should apply the Linux kernel patches that address this race condition as soon as they are available from their Linux distribution vendors. Specifically, ensure that the kernel version includes the fix that moves the nvme_stop_keep_alive() call to nvme_remove_admin_tag_set(). Until patched, organizations can mitigate risk by minimizing shutdowns or reboots of NVMe fabric controllers and avoiding abrupt shutdowns that might trigger the race. Monitoring kernel logs for nvme-related errors or crashes can help detect attempts to exploit this race condition. Additionally, organizations should review their NVMe fabric controller shutdown procedures to ensure orderly and controlled shutdowns. For critical systems, consider implementing redundancy and failover mechanisms to reduce impact from potential crashes. Coordination with hardware and storage vendors to confirm compatibility with patched kernels is also recommended.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- Linux
- Date Reserved
- 2024-11-19T17:17:25.005Z
- Cisa Enriched
- false
- Cvss Version
- null
- State
- PUBLISHED
Threat ID: 682d9820c4522896dcbdd056
Added to database: 5/21/2025, 9:08:48 AM
Last enriched: 6/27/2025, 10:27:14 PM
Last updated: 7/26/2025, 10:54:52 AM
Views: 12
Related Threats
CVE-2025-8285: CWE-862: Missing Authorization in Mattermost Mattermost Confluence Plugin
MediumCVE-2025-54525: CWE-1287: Improper Validation of Specified Type of Input in Mattermost Mattermost Confluence Plugin
HighCVE-2025-54478: CWE-306: Missing Authentication for Critical Function in Mattermost Mattermost Confluence Plugin
HighCVE-2025-54463: CWE-754: Improper Check for Unusual or Exceptional Conditions in Mattermost Mattermost Confluence Plugin
MediumCVE-2025-54458: CWE-862: Missing Authorization in Mattermost Mattermost Confluence Plugin
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.