CVE-2025-38104: Vulnerability in Linux Linux
In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.0170 ---truncated---
AI Analysis
Technical Summary
CVE-2025-38104 is a vulnerability identified in the Linux kernel's AMDGPU driver, specifically related to the handling of RLCG (Register Level Control Group) register access in virtualized environments using SR-IOV (Single Root I/O Virtualization). The vulnerability arises from the use of a mutex to serialize access to GPU registers in the amdgpu_virt_rlcg_reg_rw function. This mutex acquisition occurs in contexts where sleeping is disallowed, such as when holding a spinlock, leading to a priority inversion problem and triggering kernel bugs like "BUG: Invalid wait context." The root cause is that the mutex can cause a thread to sleep while it is in a critical section that requires non-blocking synchronization primitives. The fix involves replacing the mutex with a spinlock to avoid priority inversion and ensure safe, non-blocking register access. This vulnerability affects Linux kernel versions containing the specified commits and impacts systems using AMD GPUs with virtualization features enabled, particularly those leveraging SR-IOV for GPU virtualization. The issue manifests as kernel instability, including potential kernel panics or deadlocks, due to improper synchronization in GPU register access paths. The vulnerability is significant in environments where multiple virtual functions or threads concurrently access GPU registers, such as cloud or virtualized data centers using AMD GPUs. Although no known exploits are currently reported in the wild, the vulnerability could be leveraged to cause denial of service or system instability in affected systems. The detailed kernel stack traces and context information indicate the problem arises during GPU TLB flush operations and GPU initialization sequences involving virtualized GPU register access.
Potential Impact
For European organizations, especially those operating data centers, cloud services, or virtualization platforms using Linux with AMD GPUs and SR-IOV enabled, this vulnerability poses a risk of system instability and denial of service. The impact includes potential kernel crashes or deadlocks that could disrupt critical workloads, affecting availability of services relying on GPU acceleration or virtualization. Industries such as cloud providers, research institutions, financial services, and media companies that utilize GPU virtualization for compute-intensive tasks could experience operational interruptions. Additionally, organizations using Linux-based virtual desktop infrastructure (VDI) or GPU-accelerated virtual machines may face degraded performance or outages. While the vulnerability does not directly expose confidentiality or integrity risks, the availability impact can be significant, especially in multi-tenant environments where resource sharing is critical. The lack of known exploits reduces immediate risk, but the complexity of the bug and its presence in kernel code used widely across Europe necessitate prompt patching to maintain system reliability and service continuity.
Mitigation Recommendations
1. Apply the official Linux kernel patches that replace the mutex with a spinlock in the amdgpu driver as soon as they become available from trusted Linux distribution vendors or the upstream kernel. 2. For organizations running custom or out-of-tree kernel modules, ensure that the amdgpu driver is updated to the fixed version to prevent priority inversion issues. 3. In virtualized environments, consider temporarily disabling SR-IOV GPU virtualization features if patching is delayed, to reduce exposure to the vulnerability. 4. Monitor kernel logs for signs of "BUG: Invalid wait context" or related kernel warnings that may indicate attempts to exploit or trigger the bug. 5. Implement robust kernel crash and recovery mechanisms to minimize downtime in case of kernel panics. 6. Coordinate with hardware vendors and Linux distribution maintainers to receive timely updates and verify compatibility of patches with existing infrastructure. 7. Conduct thorough testing of updated kernels in staging environments before deployment to production to ensure stability and performance are maintained.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Poland, Italy, Spain
CVE-2025-38104: Vulnerability in Linux Linux
Description
In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.0170 ---truncated---
AI-Powered Analysis
Technical Analysis
CVE-2025-38104 is a vulnerability identified in the Linux kernel's AMDGPU driver, specifically related to the handling of RLCG (Register Level Control Group) register access in virtualized environments using SR-IOV (Single Root I/O Virtualization). The vulnerability arises from the use of a mutex to serialize access to GPU registers in the amdgpu_virt_rlcg_reg_rw function. This mutex acquisition occurs in contexts where sleeping is disallowed, such as when holding a spinlock, leading to a priority inversion problem and triggering kernel bugs like "BUG: Invalid wait context." The root cause is that the mutex can cause a thread to sleep while it is in a critical section that requires non-blocking synchronization primitives. The fix involves replacing the mutex with a spinlock to avoid priority inversion and ensure safe, non-blocking register access. This vulnerability affects Linux kernel versions containing the specified commits and impacts systems using AMD GPUs with virtualization features enabled, particularly those leveraging SR-IOV for GPU virtualization. The issue manifests as kernel instability, including potential kernel panics or deadlocks, due to improper synchronization in GPU register access paths. The vulnerability is significant in environments where multiple virtual functions or threads concurrently access GPU registers, such as cloud or virtualized data centers using AMD GPUs. Although no known exploits are currently reported in the wild, the vulnerability could be leveraged to cause denial of service or system instability in affected systems. The detailed kernel stack traces and context information indicate the problem arises during GPU TLB flush operations and GPU initialization sequences involving virtualized GPU register access.
Potential Impact
For European organizations, especially those operating data centers, cloud services, or virtualization platforms using Linux with AMD GPUs and SR-IOV enabled, this vulnerability poses a risk of system instability and denial of service. The impact includes potential kernel crashes or deadlocks that could disrupt critical workloads, affecting availability of services relying on GPU acceleration or virtualization. Industries such as cloud providers, research institutions, financial services, and media companies that utilize GPU virtualization for compute-intensive tasks could experience operational interruptions. Additionally, organizations using Linux-based virtual desktop infrastructure (VDI) or GPU-accelerated virtual machines may face degraded performance or outages. While the vulnerability does not directly expose confidentiality or integrity risks, the availability impact can be significant, especially in multi-tenant environments where resource sharing is critical. The lack of known exploits reduces immediate risk, but the complexity of the bug and its presence in kernel code used widely across Europe necessitate prompt patching to maintain system reliability and service continuity.
Mitigation Recommendations
1. Apply the official Linux kernel patches that replace the mutex with a spinlock in the amdgpu driver as soon as they become available from trusted Linux distribution vendors or the upstream kernel. 2. For organizations running custom or out-of-tree kernel modules, ensure that the amdgpu driver is updated to the fixed version to prevent priority inversion issues. 3. In virtualized environments, consider temporarily disabling SR-IOV GPU virtualization features if patching is delayed, to reduce exposure to the vulnerability. 4. Monitor kernel logs for signs of "BUG: Invalid wait context" or related kernel warnings that may indicate attempts to exploit or trigger the bug. 5. Implement robust kernel crash and recovery mechanisms to minimize downtime in case of kernel panics. 6. Coordinate with hardware vendors and Linux distribution maintainers to receive timely updates and verify compatibility of patches with existing infrastructure. 7. Conduct thorough testing of updated kernels in staging environments before deployment to production to ensure stability and performance are maintained.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- Linux
- Date Reserved
- 2025-04-16T04:51:23.985Z
- Cisa Enriched
- false
- Cvss Version
- null
- State
- PUBLISHED
Threat ID: 682d9820c4522896dcbdd484
Added to database: 5/21/2025, 9:08:48 AM
Last enriched: 7/3/2025, 7:27:31 PM
Last updated: 8/14/2025, 9:57:53 AM
Views: 10
Related Threats
CVE-2025-36088: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in IBM Storage TS4500 Library
MediumCVE-2025-43490: CWE-59 Improper Link Resolution Before File Access ('Link Following') in HP, Inc. HP Hotkey Support Software
MediumCVE-2025-9060: CWE-20 Improper Input Validation in MSoft MFlash
CriticalCVE-2025-8675: CWE-918 Server-Side Request Forgery (SSRF) in Drupal AI SEO Link Advisor
MediumCVE-2025-8362: CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in Drupal GoogleTag Manager
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.