CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
AI Analysis
Technical Summary
CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. Specifically, versions from 0.5.5 to before 0.11.1 improperly handle the chat_template_kwargs parameter in the /v1/chat/completions and /tokenize endpoints. This parameter is used in the code prior to proper validation against the chat template, allowing an attacker with low privileges to craft malicious requests that cause the server to allocate excessive resources or enter prolonged processing states. This results in denial of service by blocking the API server from processing other legitimate requests, effectively causing service delays or outages. The vulnerability does not compromise confidentiality or integrity but severely impacts availability. Exploitation requires network access and low privileges but no user interaction. The vulnerability was publicly disclosed on November 21, 2025, with a CVSS v3.1 base score of 6.5, indicating medium severity. No known exploits are currently reported in the wild. The issue has been addressed in vllm version 0.11.1, which includes proper validation and throttling mechanisms to prevent resource exhaustion. This vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services via API endpoints.
Potential Impact
For European organizations, the primary impact of CVE-2025-62426 is on service availability. Organizations relying on vllm for large language model inference may experience denial of service conditions, leading to delays or outages in AI-driven applications such as chatbots, automated customer support, or data processing pipelines. This can disrupt business operations, degrade user experience, and potentially cause financial losses or reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches are unlikely. However, the denial of service could be exploited as part of a broader attack strategy to disrupt critical AI services. Organizations in sectors with high dependency on AI, including finance, healthcare, and telecommunications, may face operational risks. The requirement for low privileges to exploit the vulnerability means insider threats or compromised accounts could be leveraged to trigger the attack. The lack of known exploits in the wild currently reduces immediate risk but does not eliminate the potential for future attacks, especially as vllm adoption grows in Europe.
Mitigation Recommendations
To mitigate CVE-2025-62426, European organizations should immediately upgrade all vllm deployments to version 0.11.1 or later, where the vulnerability has been patched. In addition to upgrading, organizations should implement strict input validation and sanitization at the API gateway level to detect and block malformed or suspicious chat_template_kwargs parameters before they reach the vllm service. Rate limiting and throttling mechanisms should be enforced on the /v1/chat/completions and /tokenize endpoints to prevent resource exhaustion from excessive or malformed requests. Monitoring and alerting should be configured to detect unusual spikes in request processing times or resource usage associated with these endpoints. Access controls should be reviewed to ensure that only authorized users with a legitimate need can invoke these API endpoints, minimizing the risk of exploitation by low-privilege users. Network segmentation and the use of web application firewalls (WAFs) can provide additional layers of defense. Finally, organizations should maintain an incident response plan that includes procedures for handling denial of service incidents affecting AI inference services.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Ireland
CVE-2025-62426: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-62426 is a resource exhaustion vulnerability classified under CWE-770 (Allocation of Resources Without Limits or Throttling) found in the vllm-project's vllm inference and serving engine for large language models. Specifically, versions from 0.5.5 to before 0.11.1 improperly handle the chat_template_kwargs parameter in the /v1/chat/completions and /tokenize endpoints. This parameter is used in the code prior to proper validation against the chat template, allowing an attacker with low privileges to craft malicious requests that cause the server to allocate excessive resources or enter prolonged processing states. This results in denial of service by blocking the API server from processing other legitimate requests, effectively causing service delays or outages. The vulnerability does not compromise confidentiality or integrity but severely impacts availability. Exploitation requires network access and low privileges but no user interaction. The vulnerability was publicly disclosed on November 21, 2025, with a CVSS v3.1 base score of 6.5, indicating medium severity. No known exploits are currently reported in the wild. The issue has been addressed in vllm version 0.11.1, which includes proper validation and throttling mechanisms to prevent resource exhaustion. This vulnerability is particularly relevant for organizations deploying vllm as part of their AI infrastructure, especially those providing LLM inference services via API endpoints.
Potential Impact
For European organizations, the primary impact of CVE-2025-62426 is on service availability. Organizations relying on vllm for large language model inference may experience denial of service conditions, leading to delays or outages in AI-driven applications such as chatbots, automated customer support, or data processing pipelines. This can disrupt business operations, degrade user experience, and potentially cause financial losses or reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches are unlikely. However, the denial of service could be exploited as part of a broader attack strategy to disrupt critical AI services. Organizations in sectors with high dependency on AI, including finance, healthcare, and telecommunications, may face operational risks. The requirement for low privileges to exploit the vulnerability means insider threats or compromised accounts could be leveraged to trigger the attack. The lack of known exploits in the wild currently reduces immediate risk but does not eliminate the potential for future attacks, especially as vllm adoption grows in Europe.
Mitigation Recommendations
To mitigate CVE-2025-62426, European organizations should immediately upgrade all vllm deployments to version 0.11.1 or later, where the vulnerability has been patched. In addition to upgrading, organizations should implement strict input validation and sanitization at the API gateway level to detect and block malformed or suspicious chat_template_kwargs parameters before they reach the vllm service. Rate limiting and throttling mechanisms should be enforced on the /v1/chat/completions and /tokenize endpoints to prevent resource exhaustion from excessive or malformed requests. Monitoring and alerting should be configured to detect unusual spikes in request processing times or resource usage associated with these endpoints. Access controls should be reviewed to ensure that only authorized users with a legitimate need can invoke these API endpoints, minimizing the risk of exploitation by low-privilege users. Network segmentation and the use of web application firewalls (WAFs) can provide additional layers of defense. Finally, organizations should maintain an incident response plan that includes procedures for handling denial of service incidents affecting AI inference services.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-10-13T16:26:12.180Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 691fc3ff70da09562fa7fc9b
Added to database: 11/21/2025, 1:44:31 AM
Last enriched: 11/21/2025, 2:00:29 AM
Last updated: 11/21/2025, 1:55:35 PM
Views: 12
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-11127: CWE-639 Authorization Bypass Through User-Controlled Key in Mstoreapp Mobile App
UnknownSliver C2 vulnerability enables attack on C2 operators through insecure Wireguard network
MediumCVE-2025-66115: Improper Control of Filename for Include/Require Statement in PHP Program ('PHP Remote File Inclusion') in MatrixAddons Easy Invoice
UnknownCVE-2025-66114: Missing Authorization in theme funda Show Variations as Single Products Woocommerce
UnknownCVE-2025-66113: Missing Authorization in ThemeAtelier Better Chat Support for Messenger
UnknownActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.