CVE-2025-48956: CWE-400: Uncontrolled Resource Consumption in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.
AI Analysis
Technical Summary
CVE-2025-48956 is a high-severity Denial of Service (DoS) vulnerability affecting the vLLM inference and serving engine for large language models (LLMs), specifically versions from 0.1.0 up to but not including 0.10.1.1. The vulnerability arises from uncontrolled resource consumption (CWE-400) triggered by sending a single HTTP GET request containing an extremely large header to an HTTP endpoint exposed by the vLLM server. This malformed request causes the server to exhaust its memory resources, leading to potential crashes or unresponsiveness. The attack vector is network-based (AV:N), requires no authentication (PR:N), and no user interaction (UI:N), making it trivially exploitable by any remote attacker. The vulnerability impacts availability only, with no direct confidentiality or integrity compromise. The issue has been addressed in version 0.10.1.1 of vLLM. No known exploits are currently reported in the wild. The CVSS v3.1 base score is 7.5, reflecting the ease of exploitation and significant impact on availability. Given that vLLM is used to serve large language models, often in AI-driven applications and services, this vulnerability could disrupt critical AI inference workloads by causing denial of service, impacting service continuity and reliability.
Potential Impact
For European organizations deploying vLLM for AI inference and serving, this vulnerability poses a significant risk to service availability. Organizations relying on vLLM for customer-facing AI applications, internal automation, or research could experience service outages or degraded performance if targeted by this DoS attack. The lack of authentication requirement means that any external attacker can attempt exploitation, increasing the attack surface. Disruptions could affect sectors such as finance, healthcare, telecommunications, and government services where AI-driven applications are increasingly integrated. Additionally, the potential for memory exhaustion could lead to cascading failures in multi-tenant or cloud environments, amplifying the impact. The unavailability of AI services could delay critical decision-making processes or customer interactions, resulting in operational and reputational damage. However, since no data confidentiality or integrity is compromised, the primary concern remains service disruption.
Mitigation Recommendations
European organizations should immediately verify their vLLM deployment versions and upgrade to v0.10.1.1 or later to remediate this vulnerability. In environments where immediate patching is not feasible, implementing network-level protections is critical. This includes configuring web application firewalls (WAFs) or reverse proxies to limit the size of HTTP headers accepted by the vLLM endpoints, effectively blocking requests with abnormally large headers. Rate limiting and anomaly detection mechanisms should be employed to identify and block suspicious traffic patterns indicative of DoS attempts. Additionally, isolating vLLM services within segmented network zones and restricting access to trusted IP ranges can reduce exposure. Monitoring system memory usage and setting up alerts for unusual resource consumption can provide early warning signs of exploitation attempts. Finally, organizations should review incident response plans to include scenarios involving AI service unavailability.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark
CVE-2025-48956: CWE-400: Uncontrolled Resource Consumption in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-48956 is a high-severity Denial of Service (DoS) vulnerability affecting the vLLM inference and serving engine for large language models (LLMs), specifically versions from 0.1.0 up to but not including 0.10.1.1. The vulnerability arises from uncontrolled resource consumption (CWE-400) triggered by sending a single HTTP GET request containing an extremely large header to an HTTP endpoint exposed by the vLLM server. This malformed request causes the server to exhaust its memory resources, leading to potential crashes or unresponsiveness. The attack vector is network-based (AV:N), requires no authentication (PR:N), and no user interaction (UI:N), making it trivially exploitable by any remote attacker. The vulnerability impacts availability only, with no direct confidentiality or integrity compromise. The issue has been addressed in version 0.10.1.1 of vLLM. No known exploits are currently reported in the wild. The CVSS v3.1 base score is 7.5, reflecting the ease of exploitation and significant impact on availability. Given that vLLM is used to serve large language models, often in AI-driven applications and services, this vulnerability could disrupt critical AI inference workloads by causing denial of service, impacting service continuity and reliability.
Potential Impact
For European organizations deploying vLLM for AI inference and serving, this vulnerability poses a significant risk to service availability. Organizations relying on vLLM for customer-facing AI applications, internal automation, or research could experience service outages or degraded performance if targeted by this DoS attack. The lack of authentication requirement means that any external attacker can attempt exploitation, increasing the attack surface. Disruptions could affect sectors such as finance, healthcare, telecommunications, and government services where AI-driven applications are increasingly integrated. Additionally, the potential for memory exhaustion could lead to cascading failures in multi-tenant or cloud environments, amplifying the impact. The unavailability of AI services could delay critical decision-making processes or customer interactions, resulting in operational and reputational damage. However, since no data confidentiality or integrity is compromised, the primary concern remains service disruption.
Mitigation Recommendations
European organizations should immediately verify their vLLM deployment versions and upgrade to v0.10.1.1 or later to remediate this vulnerability. In environments where immediate patching is not feasible, implementing network-level protections is critical. This includes configuring web application firewalls (WAFs) or reverse proxies to limit the size of HTTP headers accepted by the vLLM endpoints, effectively blocking requests with abnormally large headers. Rate limiting and anomaly detection mechanisms should be employed to identify and block suspicious traffic patterns indicative of DoS attempts. Additionally, isolating vLLM services within segmented network zones and restricting access to trusted IP ranges can reduce exposure. Monitoring system memory usage and setting up alerts for unusual resource consumption can provide early warning signs of exploitation attempts. Finally, organizations should review incident response plans to include scenarios involving AI service unavailability.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-05-28T18:49:07.585Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 68a73519ad5a09ad0011fe46
Added to database: 8/21/2025, 3:02:49 PM
Last enriched: 8/21/2025, 3:17:52 PM
Last updated: 8/21/2025, 3:17:52 PM
Views: 2
Related Threats
CVE-2025-9310: Hard-coded Credentials in yeqifu carRental
MediumCVE-2025-9309: Hard-coded Credentials in Tenda AC10
LowCVE-2025-57761: CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in LabRedesCefetRJ WeGIA
CriticalCVE-2025-43755: CWE-79: Cross-site Scripting in Liferay Portal
MediumCVE-2025-57755: CWE-200: Exposure of Sensitive Information to an Unauthorized Actor in musistudio claude-code-router
HighActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.