CVE-2025-62164: CWE-20: Improper Input Validation in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
AI Analysis
Technical Summary
CVE-2025-62164 affects the vLLM project, an inference and serving engine for large language models, specifically versions from 0.10.2 up to but not including 0.11.1. The vulnerability stems from improper input validation and unsafe deserialization of user-supplied prompt embeddings in the Completions API endpoint. The endpoint uses torch.load() to deserialize tensors without sufficient validation. Starting with PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, which allows maliciously crafted sparse tensors to bypass internal bounds checks. When the vulnerable code calls to_dense() on these tensors, an out-of-bounds memory write can occur, leading to memory corruption. This can cause the vLLM process to crash, resulting in denial-of-service, or potentially enable remote code execution on the host server. The vulnerability is exploitable remotely over the network with low privileges and does not require user interaction, increasing its risk profile. The issue has been addressed in vLLM version 0.11.1 by restoring proper validation and handling of sparse tensors during deserialization. No known exploits have been reported in the wild yet, but the high CVSS score of 8.8 reflects the critical impact and ease of exploitation. The vulnerability involves multiple CWEs: CWE-20 (Improper Input Validation), CWE-123 (Write-What-Where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
Potential Impact
For European organizations, this vulnerability poses significant risks to the confidentiality, integrity, and availability of AI inference services relying on vLLM. Exploitation could allow attackers to execute arbitrary code on servers hosting vLLM, potentially leading to data breaches, unauthorized access to sensitive AI models or data, and disruption of AI-powered services. Organizations using vLLM in cloud or on-premises environments for critical AI workloads may face service outages or compromise of intellectual property. The vulnerability's network accessibility and lack of user interaction requirements increase the attack surface, especially for publicly exposed API endpoints. Given the growing adoption of AI technologies across Europe in sectors such as finance, healthcare, and manufacturing, the impact could extend to critical infrastructure and sensitive data processing. The absence of known exploits currently provides a window for proactive mitigation, but the high severity demands urgent attention.
Mitigation Recommendations
1. Immediately upgrade all vLLM deployments to version 0.11.1 or later, where the vulnerability is patched. 2. Restrict access to the Completions API endpoint by implementing strong authentication and network segmentation, limiting API calls to trusted users and internal networks only. 3. Employ input validation and sanitization at the application layer to detect and reject malformed or suspicious tensor data before deserialization. 4. Monitor API usage logs for anomalous or unexpected requests that could indicate exploitation attempts. 5. If upgrading is not immediately feasible, consider disabling or restricting the use of the Completions API endpoint to reduce exposure. 6. Review and update incident response plans to include detection and mitigation steps for potential exploitation of deserialization vulnerabilities. 7. Coordinate with cloud providers or third-party vendors to ensure patched versions are deployed in managed environments. 8. Stay informed on threat intelligence updates regarding any emerging exploits targeting this vulnerability.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Ireland, Belgium, Switzerland
CVE-2025-62164: CWE-20: Improper Input Validation in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default. As a result, maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to code execution on the server hosting vLLM. This issue has been patched in version 0.11.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-62164 affects the vLLM project, an inference and serving engine for large language models, specifically versions from 0.10.2 up to but not including 0.11.1. The vulnerability stems from improper input validation and unsafe deserialization of user-supplied prompt embeddings in the Completions API endpoint. The endpoint uses torch.load() to deserialize tensors without sufficient validation. Starting with PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, which allows maliciously crafted sparse tensors to bypass internal bounds checks. When the vulnerable code calls to_dense() on these tensors, an out-of-bounds memory write can occur, leading to memory corruption. This can cause the vLLM process to crash, resulting in denial-of-service, or potentially enable remote code execution on the host server. The vulnerability is exploitable remotely over the network with low privileges and does not require user interaction, increasing its risk profile. The issue has been addressed in vLLM version 0.11.1 by restoring proper validation and handling of sparse tensors during deserialization. No known exploits have been reported in the wild yet, but the high CVSS score of 8.8 reflects the critical impact and ease of exploitation. The vulnerability involves multiple CWEs: CWE-20 (Improper Input Validation), CWE-123 (Write-What-Where Condition), CWE-502 (Deserialization of Untrusted Data), and CWE-787 (Out-of-bounds Write).
Potential Impact
For European organizations, this vulnerability poses significant risks to the confidentiality, integrity, and availability of AI inference services relying on vLLM. Exploitation could allow attackers to execute arbitrary code on servers hosting vLLM, potentially leading to data breaches, unauthorized access to sensitive AI models or data, and disruption of AI-powered services. Organizations using vLLM in cloud or on-premises environments for critical AI workloads may face service outages or compromise of intellectual property. The vulnerability's network accessibility and lack of user interaction requirements increase the attack surface, especially for publicly exposed API endpoints. Given the growing adoption of AI technologies across Europe in sectors such as finance, healthcare, and manufacturing, the impact could extend to critical infrastructure and sensitive data processing. The absence of known exploits currently provides a window for proactive mitigation, but the high severity demands urgent attention.
Mitigation Recommendations
1. Immediately upgrade all vLLM deployments to version 0.11.1 or later, where the vulnerability is patched. 2. Restrict access to the Completions API endpoint by implementing strong authentication and network segmentation, limiting API calls to trusted users and internal networks only. 3. Employ input validation and sanitization at the application layer to detect and reject malformed or suspicious tensor data before deserialization. 4. Monitor API usage logs for anomalous or unexpected requests that could indicate exploitation attempts. 5. If upgrading is not immediately feasible, consider disabling or restricting the use of the Completions API endpoint to reduce exposure. 6. Review and update incident response plans to include detection and mitigation steps for potential exploitation of deserialization vulnerabilities. 7. Coordinate with cloud providers or third-party vendors to ensure patched versions are deployed in managed environments. 8. Stay informed on threat intelligence updates regarding any emerging exploits targeting this vulnerability.
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-10-07T16:12:03.425Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 691fc3ff70da09562fa7fc8e
Added to database: 11/21/2025, 1:44:31 AM
Last enriched: 11/21/2025, 2:00:14 AM
Last updated: 11/21/2025, 2:35:48 PM
Views: 13
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-41115: Vulnerability in Grafana Grafana Enterprise
CriticalCVE-2025-13432: CWE-863: Incorrect Authorization in HashiCorp Terraform Enterprise
MediumCVE-2025-11127: CWE-639 Authorization Bypass Through User-Controlled Key in Mstoreapp Mobile App
CriticalCVE-2025-66115: Improper Control of Filename for Include/Require Statement in PHP Program ('PHP Remote File Inclusion') in MatrixAddons Easy Invoice
UnknownCVE-2025-66114: Missing Authorization in theme funda Show Variations as Single Products Woocommerce
UnknownActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.