Skip to main content
Press slash or control plus K to focus the search. Use the arrow keys to navigate results and press enter to open a threat.
Reconnecting to live updates…

CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm

0
Medium
VulnerabilityCVE-2025-62426cvecve-2025-62426cwe-770
Published: Fri Nov 21 2025 (11/21/2025, 01:21:29 UTC)
Source: CVE Database V5
Vendor/Project: vllm-project
Product: vllm

Description

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.

AI-Powered Analysis

AILast updated: 11/28/2025, 04:41:49 UTC

Technical Analysis

CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. The vulnerability affects versions from 0.5.5 up to but not including 0.11.1. The root cause is improper validation of the chat_template_kwargs parameter submitted to the /v1/chat/completions and /tokenize API endpoints. This parameter is used in the code before being properly validated against the chat template, allowing an attacker with access to these endpoints and the ability to supply crafted chat_template_kwargs to trigger excessive resource consumption. Specifically, the crafted parameters can cause the API server to block processing for extended periods, effectively delaying or denying service to other legitimate requests. The vulnerability does not impact confidentiality or integrity but severely impacts availability, leading to denial of service. Exploitation requires network access and privileges to invoke the API endpoints but does not require user interaction. The vulnerability has a CVSS v3.1 score of 6.5 (medium severity), reflecting its moderate impact and ease of exploitation. The issue was publicly disclosed on November 21, 2025, and patched in vllm version 0.11.1. No known exploits have been reported in the wild to date. The vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services at scale.

Potential Impact

For European organizations, this vulnerability poses a significant risk to the availability of AI inference services relying on the vllm engine. Organizations using affected versions may experience denial of service conditions, leading to service outages or degraded performance of LLM-based applications. This can disrupt business operations, especially in sectors such as finance, healthcare, research, and technology where AI-driven services are increasingly critical. The inability to process API requests timely could impact customer-facing applications, internal automation, and decision-making systems. Additionally, prolonged service disruptions could lead to reputational damage and financial losses. Since the vulnerability requires privileged access to the API, insider threats or compromised credentials could be leveraged to exploit this issue. The lack of impact on confidentiality and integrity limits data breach risks, but availability impacts alone can have severe operational consequences. European organizations with strict service-level agreements (SLAs) and regulatory requirements for uptime may face compliance challenges if affected.

Mitigation Recommendations

To mitigate this vulnerability, European organizations should promptly upgrade all vllm deployments to version 0.11.1 or later, where the issue is patched. Implement strict input validation and sanitization for the chat_template_kwargs parameter to prevent malformed or malicious payloads from triggering resource exhaustion. Deploy rate limiting and throttling mechanisms on the affected API endpoints (/v1/chat/completions and /tokenize) to limit the impact of any single client or request. Monitor API usage patterns for anomalies indicative of exploitation attempts, such as unusually long processing times or repeated requests with suspicious parameters. Restrict access to the API endpoints to trusted users and systems, enforcing strong authentication and authorization controls. Employ network segmentation and firewall rules to limit exposure of the inference engine to only necessary clients. Regularly audit and review logs for signs of abuse or performance degradation. Finally, maintain an incident response plan that includes procedures for mitigating denial of service conditions affecting AI services.

Need more detailed analysis?Upgrade to Pro Console

Technical Details

Data Version
5.2
Assigner Short Name
GitHub_M
Date Reserved
2025-10-13T16:26:12.180Z
Cvss Version
3.1
State
PUBLISHED

Threat ID: 691fc3ff70da09562fa7fc9b

Added to database: 11/21/2025, 1:44:31 AM

Last enriched: 11/28/2025, 4:41:49 AM

Last updated: 1/8/2026, 11:36:44 AM

Views: 153

Community Reviews

0 reviews

Crowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.

Sort by
Loading community insights…

Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.

Actions

PRO

Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.

Please log in to the Console to use AI analysis features.

Need more coverage?

Upgrade to Pro Console in Console -> Billing for AI refresh and higher limits.

For incident response and remediation, OffSeq services can help resolve threats faster.

Latest Threats