CVE-2025-46570: CWE-208: Observable Timing Discrepancy in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
AI Analysis
Technical Summary
CVE-2025-46570 is a timing side-channel vulnerability identified in the vLLM inference and serving engine for large language models (LLMs), specifically in versions prior to 0.9.0. The vulnerability arises from the PageAttention mechanism used during prompt processing. When a new prompt is processed, if the PageAttention mechanism detects a matching prefix chunk from previous inputs, it accelerates the prefill process, resulting in a noticeably faster Time to First Token (TTFT). This timing discrepancy is observable and can be exploited by an attacker to infer information about the prompt content or the internal state of the model serving process. The vulnerability is categorized under CWE-208, which relates to observable timing discrepancies that can leak sensitive information. The issue has been addressed and patched in version 0.9.0 of vLLM. The CVSS v3.1 base score is 2.6, indicating a low severity level. The vector indicates that the attack requires network access (AV:N), high attack complexity (AC:H), low privileges (PR:L), and user interaction (UI:R), with an impact limited to confidentiality (C:L) and no impact on integrity or availability. No known exploits are reported in the wild at this time.
Potential Impact
For European organizations utilizing vLLM versions prior to 0.9.0, this vulnerability could potentially allow attackers to glean partial information about the prompts being processed by observing timing differences in responses. While the confidentiality impact is low, in sensitive environments where prompt content may include proprietary, personal, or confidential data, even limited leakage could be problematic. The vulnerability does not affect integrity or availability, so operational disruption or data manipulation risks are minimal. However, organizations deploying LLM inference services in sectors such as finance, healthcare, or government where data sensitivity is high should consider this risk seriously. The requirement for user interaction and low privileges reduces the likelihood of widespread exploitation, but targeted attacks remain possible, especially in multi-tenant or cloud-hosted inference environments common in Europe.
Mitigation Recommendations
European organizations should upgrade all vLLM deployments to version 0.9.0 or later to eliminate this timing side-channel vulnerability. For environments where immediate upgrading is not feasible, consider implementing network-level controls to restrict access to the inference service, limiting exposure to untrusted users. Monitoring and logging of inference request patterns may help detect anomalous probing attempts exploiting timing differences. Additionally, organizations can introduce artificial delays or jitter in response times to obscure timing discrepancies, though this may impact performance. For highly sensitive use cases, isolating inference workloads and employing strict authentication and authorization mechanisms can further reduce risk. Finally, maintain awareness of vendor updates and security advisories related to vLLM and similar LLM serving engines.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Belgium
CVE-2025-46570: CWE-208: Observable Timing Discrepancy in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
AI-Powered Analysis
Technical Analysis
CVE-2025-46570 is a timing side-channel vulnerability identified in the vLLM inference and serving engine for large language models (LLMs), specifically in versions prior to 0.9.0. The vulnerability arises from the PageAttention mechanism used during prompt processing. When a new prompt is processed, if the PageAttention mechanism detects a matching prefix chunk from previous inputs, it accelerates the prefill process, resulting in a noticeably faster Time to First Token (TTFT). This timing discrepancy is observable and can be exploited by an attacker to infer information about the prompt content or the internal state of the model serving process. The vulnerability is categorized under CWE-208, which relates to observable timing discrepancies that can leak sensitive information. The issue has been addressed and patched in version 0.9.0 of vLLM. The CVSS v3.1 base score is 2.6, indicating a low severity level. The vector indicates that the attack requires network access (AV:N), high attack complexity (AC:H), low privileges (PR:L), and user interaction (UI:R), with an impact limited to confidentiality (C:L) and no impact on integrity or availability. No known exploits are reported in the wild at this time.
Potential Impact
For European organizations utilizing vLLM versions prior to 0.9.0, this vulnerability could potentially allow attackers to glean partial information about the prompts being processed by observing timing differences in responses. While the confidentiality impact is low, in sensitive environments where prompt content may include proprietary, personal, or confidential data, even limited leakage could be problematic. The vulnerability does not affect integrity or availability, so operational disruption or data manipulation risks are minimal. However, organizations deploying LLM inference services in sectors such as finance, healthcare, or government where data sensitivity is high should consider this risk seriously. The requirement for user interaction and low privileges reduces the likelihood of widespread exploitation, but targeted attacks remain possible, especially in multi-tenant or cloud-hosted inference environments common in Europe.
Mitigation Recommendations
European organizations should upgrade all vLLM deployments to version 0.9.0 or later to eliminate this timing side-channel vulnerability. For environments where immediate upgrading is not feasible, consider implementing network-level controls to restrict access to the inference service, limiting exposure to untrusted users. Monitoring and logging of inference request patterns may help detect anomalous probing attempts exploiting timing differences. Additionally, organizations can introduce artificial delays or jitter in response times to obscure timing discrepancies, though this may impact performance. For highly sensitive use cases, isolating inference workloads and employing strict authentication and authorization mechanisms can further reduce risk. Finally, maintain awareness of vendor updates and security advisories related to vLLM and similar LLM serving engines.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-04-24T21:10:48.175Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 68388f0b182aa0cae285909c
Added to database: 5/29/2025, 4:44:59 PM
Last enriched: 7/7/2025, 11:09:48 PM
Last updated: 8/10/2025, 3:03:35 PM
Views: 9
Related Threats
CVE-2025-52621: CWE-346 Origin Validation Error in HCL Software BigFix SaaS Remediate
MediumCVE-2025-52620: CWE-20 Improper Input Validation in HCL Software BigFix SaaS Remediate
MediumCVE-2025-52619: CWE-209 Generation of Error Message Containing Sensitive Information in HCL Software BigFix SaaS Remediate
MediumCVE-2025-52618: CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in HCL Software BigFix SaaS Remediate
MediumCVE-2025-43201: An app may be able to unexpectedly leak a user's credentials in Apple Apple Music Classical for Android
HighActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.