Skip to main content
Press slash or control plus K to focus the search. Use the arrow keys to navigate results and press enter to open a threat.
Reconnecting to live updates…

CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm

0
Medium
VulnerabilityCVE-2025-62426cvecve-2025-62426cwe-770
Published: Fri Nov 21 2025 (11/21/2025, 01:21:29 UTC)
Source: CVE Database V5
Vendor/Project: vllm-project
Product: vllm

Description

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.

AI-Powered Analysis

AILast updated: 11/21/2025, 02:00:29 UTC

Technical Analysis

CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. Specifically, versions from 0.5.5 to before 0.11.1 improperly handle the chat_template_kwargs parameter in the /v1/chat/completions and /tokenize endpoints. This parameter is used in the code prior to proper validation against the chat template, allowing an attacker with low privileges to craft malicious requests that cause the server to allocate excessive resources or enter prolonged processing states. This results in denial of service by blocking the API server from processing other legitimate requests, effectively causing service delays or outages. The vulnerability does not compromise confidentiality or integrity but severely impacts availability. Exploitation requires network access and low privileges but no user interaction. The vulnerability was publicly disclosed on November 21, 2025, with a CVSS v3.1 base score of 6.5, indicating medium severity. No known exploits are currently reported in the wild. The issue has been addressed in vllm version 0.11.1, which includes proper validation and throttling mechanisms to prevent resource exhaustion. This vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services via API endpoints.

Potential Impact

For European organizations, the primary impact of CVE-2025-62426 is on service availability. Organizations relying on vllm for large language model inference may experience denial of service conditions, leading to delays or outages in AI-driven applications such as chatbots, automated customer support, or data processing pipelines. This can disrupt business operations, degrade user experience, and potentially cause financial losses or reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches are unlikely. However, the denial of service could be exploited as part of a broader attack strategy to disrupt critical AI services. Organizations in sectors with high dependency on AI, including finance, healthcare, and telecommunications, may face operational risks. The requirement for low privileges to exploit the vulnerability means insider threats or compromised accounts could be leveraged to trigger the attack. The lack of known exploits in the wild currently reduces immediate risk but does not eliminate the potential for future attacks, especially as vllm adoption grows in Europe.

Mitigation Recommendations

To mitigate CVE-2025-62426, European organizations should immediately upgrade all vllm deployments to version 0.11.1 or later, where the vulnerability has been patched. In addition to upgrading, organizations should implement strict input validation and sanitization at the API gateway level to detect and block malformed or suspicious chat_template_kwargs parameters before they reach the vllm service. Rate limiting and throttling mechanisms should be enforced on the /v1/chat/completions and /tokenize endpoints to prevent resource exhaustion from excessive or malformed requests. Monitoring and alerting should be configured to detect unusual spikes in request processing times or resource usage associated with these endpoints. Access controls should be reviewed to ensure that only authorized users with a legitimate need can invoke these API endpoints, minimizing the risk of exploitation by low-privilege users. Network segmentation and the use of web application firewalls (WAFs) can provide additional layers of defense. Finally, organizations should maintain an incident response plan that includes procedures for handling denial of service incidents affecting AI inference services.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.2
Assigner Short Name
GitHub_M
Date Reserved
2025-10-13T16:26:12.180Z
Cvss Version
3.1
State
PUBLISHED

Threat ID: 691fc3ff70da09562fa7fc9b

Added to database: 11/21/2025, 1:44:31 AM

Last enriched: 11/21/2025, 2:00:29 AM

Last updated: 11/21/2025, 1:55:35 PM

Views: 12

Community Reviews

0 reviews

Crowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.

Sort by
Loading community insights…

Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.

Actions

PRO

Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats