CVE-2025-62164: CWE-20: Improper Input Validation in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
AI Analysis
Technical Summary
CVE-2025-62164 is a critical vulnerability affecting the vLLM inference and serving engine for large language models, specifically versions from 0.10.2 up to but not including 0.11.1. The root cause is improper input validation when processing user-supplied prompt embeddings through the Completions API endpoint. This endpoint uses torch.load() to deserialize serialized tensor data without sufficient validation. Starting with PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, which allows maliciously crafted sparse tensors to bypass internal bounds checks. When the vulnerable vLLM calls to_dense() on these tensors, an out-of-bounds memory write occurs, leading to memory corruption. This can cause the vLLM process to crash, resulting in denial-of-service (DoS). More critically, the memory corruption can be exploited to achieve remote code execution (RCE) on the server hosting vLLM, potentially allowing attackers to execute arbitrary code with the privileges of the vLLM process. The vulnerability requires network access and low privileges (PR:L), but no user interaction is needed, making it relatively easy to exploit remotely. The issue has been addressed and patched in vLLM version 0.11.1. No known exploits are currently reported in the wild, but the high CVSS score of 8.8 indicates a significant risk. The vulnerability is associated with multiple CWEs: CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
Potential Impact
For European organizations, the impact of CVE-2025-62164 is substantial, especially for those deploying vLLM as part of AI inference services or internal machine learning workflows. Exploitation can lead to denial-of-service conditions, disrupting critical AI-driven applications and services, which may affect business continuity and operational efficiency. More severely, successful remote code execution could allow attackers to gain control over the server environment, leading to data breaches, lateral movement within networks, and potential compromise of sensitive intellectual property or personal data. This is particularly concerning for sectors relying heavily on AI, such as finance, healthcare, telecommunications, and government agencies. The vulnerability's network accessibility and lack of user interaction requirement increase the attack surface. Given the growing adoption of AI technologies in Europe, the risk to confidentiality, integrity, and availability of systems is high. Additionally, disruption or compromise of AI services could undermine trust in AI deployments and cause regulatory and compliance issues under GDPR and other data protection laws.
Mitigation Recommendations
To mitigate CVE-2025-62164, organizations should immediately upgrade all vLLM deployments to version 0.11.1 or later, where the vulnerability is patched. Until upgrades are complete, restrict network access to the Completions API endpoint to trusted users and systems only, employing network segmentation and firewall rules. Implement strict input validation and sanitization on all user-supplied tensor data before deserialization, potentially adding custom checks to detect malformed or suspicious tensor structures. Monitor logs and network traffic for anomalous tensor payloads or unusual API usage patterns that could indicate exploitation attempts. Employ runtime protections such as containerization or sandboxing of vLLM processes to limit the impact of potential code execution. Regularly audit and update dependencies, including PyTorch, to ensure security features like sparse tensor integrity checks are enabled. Finally, maintain an incident response plan tailored to AI service compromises, including rapid isolation and forensic analysis capabilities.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland
CVE-2025-62164: CWE-20: Improper Input Validation in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-62164 is a critical vulnerability affecting the vLLM inference and serving engine for large language models, specifically versions from 0.10.2 up to but not including 0.11.1. The root cause is improper input validation when processing user-supplied prompt embeddings through the Completions API endpoint. This endpoint uses torch.load() to deserialize serialized tensor data without sufficient validation. Starting with PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, which allows maliciously crafted sparse tensors to bypass internal bounds checks. When the vulnerable vLLM calls to_dense() on these tensors, an out-of-bounds memory write occurs, leading to memory corruption. This can cause the vLLM process to crash, resulting in denial-of-service (DoS). More critically, the memory corruption can be exploited to achieve remote code execution (RCE) on the server hosting vLLM, potentially allowing attackers to execute arbitrary code with the privileges of the vLLM process. The vulnerability requires network access and low privileges (PR:L), but no user interaction is needed, making it relatively easy to exploit remotely. The issue has been addressed and patched in vLLM version 0.11.1. No known exploits are currently reported in the wild, but the high CVSS score of 8.8 indicates a significant risk. The vulnerability is associated with multiple CWEs: CWE-20 (Improper Input Validation), CWE-123 (Write-what-where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
Potential Impact
For European organizations, the impact of CVE-2025-62164 is substantial, especially for those deploying vLLM as part of AI inference services or internal machine learning workflows. Exploitation can lead to denial-of-service conditions, disrupting critical AI-driven applications and services, which may affect business continuity and operational efficiency. More severely, successful remote code execution could allow attackers to gain control over the server environment, leading to data breaches, lateral movement within networks, and potential compromise of sensitive intellectual property or personal data. This is particularly concerning for sectors relying heavily on AI, such as finance, healthcare, telecommunications, and government agencies. The vulnerability's network accessibility and lack of user interaction requirement increase the attack surface. Given the growing adoption of AI technologies in Europe, the risk to confidentiality, integrity, and availability of systems is high. Additionally, disruption or compromise of AI services could undermine trust in AI deployments and cause regulatory and compliance issues under GDPR and other data protection laws.
Mitigation Recommendations
To mitigate CVE-2025-62164, organizations should immediately upgrade all vLLM deployments to version 0.11.1 or later, where the vulnerability is patched. Until upgrades are complete, restrict network access to the Completions API endpoint to trusted users and systems only, employing network segmentation and firewall rules. Implement strict input validation and sanitization on all user-supplied tensor data before deserialization, potentially adding custom checks to detect malformed or suspicious tensor structures. Monitor logs and network traffic for anomalous tensor payloads or unusual API usage patterns that could indicate exploitation attempts. Employ runtime protections such as containerization or sandboxing of vLLM processes to limit the impact of potential code execution. Regularly audit and update dependencies, including PyTorch, to ensure security features like sparse tensor integrity checks are enabled. Finally, maintain an incident response plan tailored to AI service compromises, including rapid isolation and forensic analysis capabilities.
Affected Countries
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-10-07T16:12:03.425Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 691fc3ff70da09562fa7fc8e
Added to database: 11/21/2025, 1:44:31 AM
Last enriched: 11/28/2025, 4:41:18 AM
Last updated: 1/7/2026, 4:48:14 AM
Views: 309
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console in Console -> Billing for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.