CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
AI Analysis
Technical Summary
CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. The vulnerability affects versions from 0.5.5 up to but not including 0.11.1. The root cause is improper validation of the chat_template_kwargs parameter submitted to the /v1/chat/completions and /tokenize API endpoints. This parameter is used in the code before being properly validated against the chat template, allowing an attacker with access to these endpoints and the ability to supply crafted chat_template_kwargs to trigger excessive resource consumption. Specifically, the crafted parameters can cause the API server to block processing for extended periods, effectively delaying or denying service to other legitimate requests. The vulnerability does not impact confidentiality or integrity but severely impacts availability, leading to denial of service. Exploitation requires network access and privileges to invoke the API endpoints but does not require user interaction. The vulnerability has a CVSS v3.1 score of 6.5 (medium severity), reflecting its moderate impact and ease of exploitation. The issue was publicly disclosed on November 21, 2025, and patched in vllm version 0.11.1. No known exploits have been reported in the wild to date. The vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services at scale.
Potential Impact
For European organizations, this vulnerability poses a significant risk to the availability of AI inference services relying on the vllm engine. Organizations using affected versions may experience denial of service conditions, leading to service outages or degraded performance of LLM-based applications. This can disrupt business operations, especially in sectors such as finance, healthcare, research, and technology where AI-driven services are increasingly critical. The inability to process API requests timely could impact customer-facing applications, internal automation, and decision-making systems. Additionally, prolonged service disruptions could lead to reputational damage and financial losses. Since the vulnerability requires privileged access to the API, insider threats or compromised credentials could be leveraged to exploit this issue. The lack of impact on confidentiality and integrity limits data breach risks, but availability impacts alone can have severe operational consequences. European organizations with strict service-level agreements (SLAs) and regulatory requirements for uptime may face compliance challenges if affected.
Mitigation Recommendations
To mitigate this vulnerability, European organizations should promptly upgrade all vllm deployments to version 0.11.1 or later, where the issue is patched. Implement strict input validation and sanitization for the chat_template_kwargs parameter to prevent malformed or malicious payloads from triggering resource exhaustion. Deploy rate limiting and throttling mechanisms on the affected API endpoints (/v1/chat/completions and /tokenize) to limit the impact of any single client or request. Monitor API usage patterns for anomalies indicative of exploitation attempts, such as unusually long processing times or repeated requests with suspicious parameters. Restrict access to the API endpoints to trusted users and systems, enforcing strong authentication and authorization controls. Employ network segmentation and firewall rules to limit exposure of the inference engine to only necessary clients. Regularly audit and review logs for signs of abuse or performance degradation. Finally, maintain an incident response plan that includes procedures for mitigating denial of service conditions affecting AI services.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Belgium
CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. The vulnerability affects versions from 0.5.5 up to but not including 0.11.1. The root cause is improper validation of the chat_template_kwargs parameter submitted to the /v1/chat/completions and /tokenize API endpoints. This parameter is used in the code before being properly validated against the chat template, allowing an attacker with access to these endpoints and the ability to supply crafted chat_template_kwargs to trigger excessive resource consumption. Specifically, the crafted parameters can cause the API server to block processing for extended periods, effectively delaying or denying service to other legitimate requests. The vulnerability does not impact confidentiality or integrity but severely impacts availability, leading to denial of service. Exploitation requires network access and privileges to invoke the API endpoints but does not require user interaction. The vulnerability has a CVSS v3.1 score of 6.5 (medium severity), reflecting its moderate impact and ease of exploitation. The issue was publicly disclosed on November 21, 2025, and patched in vllm version 0.11.1. No known exploits have been reported in the wild to date. The vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services at scale.
Potential Impact
For European organizations, this vulnerability poses a significant risk to the availability of AI inference services relying on the vllm engine. Organizations using affected versions may experience denial of service conditions, leading to service outages or degraded performance of LLM-based applications. This can disrupt business operations, especially in sectors such as finance, healthcare, research, and technology where AI-driven services are increasingly critical. The inability to process API requests timely could impact customer-facing applications, internal automation, and decision-making systems. Additionally, prolonged service disruptions could lead to reputational damage and financial losses. Since the vulnerability requires privileged access to the API, insider threats or compromised credentials could be leveraged to exploit this issue. The lack of impact on confidentiality and integrity limits data breach risks, but availability impacts alone can have severe operational consequences. European organizations with strict service-level agreements (SLAs) and regulatory requirements for uptime may face compliance challenges if affected.
Mitigation Recommendations
To mitigate this vulnerability, European organizations should promptly upgrade all vllm deployments to version 0.11.1 or later, where the issue is patched. Implement strict input validation and sanitization for the chat_template_kwargs parameter to prevent malformed or malicious payloads from triggering resource exhaustion. Deploy rate limiting and throttling mechanisms on the affected API endpoints (/v1/chat/completions and /tokenize) to limit the impact of any single client or request. Monitor API usage patterns for anomalies indicative of exploitation attempts, such as unusually long processing times or repeated requests with suspicious parameters. Restrict access to the API endpoints to trusted users and systems, enforcing strong authentication and authorization controls. Employ network segmentation and firewall rules to limit exposure of the inference engine to only necessary clients. Regularly audit and review logs for signs of abuse or performance degradation. Finally, maintain an incident response plan that includes procedures for mitigating denial of service conditions affecting AI services.
Affected Countries
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-10-13T16:26:12.180Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 691fc3ff70da09562fa7fc9b
Added to database: 11/21/2025, 1:44:31 AM
Last enriched: 11/28/2025, 4:41:49 AM
Last updated: 1/8/2026, 12:43:21 PM
Views: 154
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-62877: CWE-1188: Initialization of a Resource with an Insecure Default in SUSE harvester
CriticalCVE-2024-1574: CWE-470 Use of Externally-Controlled Input to Select Classes or Code ('Unsafe Reflection') in Mitsubishi Electric Iconics Digital Solutions GENESIS64
MediumCVE-2024-1573: CWE-306 Missing Authentication for Critical Function in Mitsubishi Electric Iconics Digital Solutions GENESIS64
MediumThe State of Trusted Open Source
MediumCVE-2024-1182: CWE-427 Uncontrolled Search Path Element in Mitsubishi Electric Iconics Digital Solutions GENESIS64
HighActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console in Console -> Billing for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.