CVE-2025-46570: CWE-208: Observable Timing Discrepancy in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
AI Analysis
Technical Summary
CVE-2025-46570 is a timing side-channel vulnerability identified in the vLLM inference and serving engine for large language models (LLMs), specifically in versions prior to 0.9.0. The vulnerability arises from the PageAttention mechanism used during prompt processing. When a new prompt is processed, if the PageAttention mechanism detects a matching prefix chunk from previous inputs, it accelerates the prefill process, resulting in a noticeably faster Time to First Token (TTFT). This timing discrepancy is observable and can be exploited by an attacker to infer information about the prompt content or the internal state of the model serving process. The vulnerability is categorized under CWE-208, which relates to observable timing discrepancies that can leak sensitive information. The issue has been addressed and patched in version 0.9.0 of vLLM. The CVSS v3.1 base score is 2.6, indicating a low severity level. The vector indicates that the attack requires network access (AV:N), high attack complexity (AC:H), low privileges (PR:L), and user interaction (UI:R), with an impact limited to confidentiality (C:L) and no impact on integrity or availability. No known exploits are reported in the wild at this time.
Potential Impact
For European organizations utilizing vLLM versions prior to 0.9.0, this vulnerability could potentially allow attackers to glean partial information about the prompts being processed by observing timing differences in responses. While the confidentiality impact is low, in sensitive environments where prompt content may include proprietary, personal, or confidential data, even limited leakage could be problematic. The vulnerability does not affect integrity or availability, so operational disruption or data manipulation risks are minimal. However, organizations deploying LLM inference services in sectors such as finance, healthcare, or government where data sensitivity is high should consider this risk seriously. The requirement for user interaction and low privileges reduces the likelihood of widespread exploitation, but targeted attacks remain possible, especially in multi-tenant or cloud-hosted inference environments common in Europe.
Mitigation Recommendations
European organizations should upgrade all vLLM deployments to version 0.9.0 or later to eliminate this timing side-channel vulnerability. For environments where immediate upgrading is not feasible, consider implementing network-level controls to restrict access to the inference service, limiting exposure to untrusted users. Monitoring and logging of inference request patterns may help detect anomalous probing attempts exploiting timing differences. Additionally, organizations can introduce artificial delays or jitter in response times to obscure timing discrepancies, though this may impact performance. For highly sensitive use cases, isolating inference workloads and employing strict authentication and authorization mechanisms can further reduce risk. Finally, maintain awareness of vendor updates and security advisories related to vLLM and similar LLM serving engines.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Belgium
CVE-2025-46570: CWE-208: Observable Timing Discrepancy in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
AI-Powered Analysis
Technical Analysis
CVE-2025-46570 is a timing side-channel vulnerability identified in the vLLM inference and serving engine for large language models (LLMs), specifically in versions prior to 0.9.0. The vulnerability arises from the PageAttention mechanism used during prompt processing. When a new prompt is processed, if the PageAttention mechanism detects a matching prefix chunk from previous inputs, it accelerates the prefill process, resulting in a noticeably faster Time to First Token (TTFT). This timing discrepancy is observable and can be exploited by an attacker to infer information about the prompt content or the internal state of the model serving process. The vulnerability is categorized under CWE-208, which relates to observable timing discrepancies that can leak sensitive information. The issue has been addressed and patched in version 0.9.0 of vLLM. The CVSS v3.1 base score is 2.6, indicating a low severity level. The vector indicates that the attack requires network access (AV:N), high attack complexity (AC:H), low privileges (PR:L), and user interaction (UI:R), with an impact limited to confidentiality (C:L) and no impact on integrity or availability. No known exploits are reported in the wild at this time.
Potential Impact
For European organizations utilizing vLLM versions prior to 0.9.0, this vulnerability could potentially allow attackers to glean partial information about the prompts being processed by observing timing differences in responses. While the confidentiality impact is low, in sensitive environments where prompt content may include proprietary, personal, or confidential data, even limited leakage could be problematic. The vulnerability does not affect integrity or availability, so operational disruption or data manipulation risks are minimal. However, organizations deploying LLM inference services in sectors such as finance, healthcare, or government where data sensitivity is high should consider this risk seriously. The requirement for user interaction and low privileges reduces the likelihood of widespread exploitation, but targeted attacks remain possible, especially in multi-tenant or cloud-hosted inference environments common in Europe.
Mitigation Recommendations
European organizations should upgrade all vLLM deployments to version 0.9.0 or later to eliminate this timing side-channel vulnerability. For environments where immediate upgrading is not feasible, consider implementing network-level controls to restrict access to the inference service, limiting exposure to untrusted users. Monitoring and logging of inference request patterns may help detect anomalous probing attempts exploiting timing differences. Additionally, organizations can introduce artificial delays or jitter in response times to obscure timing discrepancies, though this may impact performance. For highly sensitive use cases, isolating inference workloads and employing strict authentication and authorization mechanisms can further reduce risk. Finally, maintain awareness of vendor updates and security advisories related to vLLM and similar LLM serving engines.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-04-24T21:10:48.175Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 68388f0b182aa0cae285909c
Added to database: 5/29/2025, 4:44:59 PM
Last enriched: 7/7/2025, 11:09:48 PM
Last updated: 11/21/2025, 5:14:04 AM
Views: 35
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-64310: Improper restriction of excessive authentication attempts in SEIKO EPSON CORPORATION EPSON WebConfig for SEIKO EPSON Projector Products
CriticalCVE-2025-64762: CWE-524: Use of Cache Containing Sensitive Information in workos authkit-nextjs
HighCVE-2025-64755: CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection') in anthropics claude-code
HighCVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
MediumCVE-2025-62372: CWE-129: Improper Validation of Array Index in vllm-project vllm
HighActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.