CVE-2025-30202: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts. Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker. By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher. This issue has been patched in version 0.8.5.
AI Analysis
Technical Summary
CVE-2025-30202 is a vulnerability in the vLLM project, specifically affecting versions from 0.5.2 up to but not including 0.8.5. vLLM is a high-throughput, memory-efficient inference and serving engine designed for large language models (LLMs). In multi-node deployments, vLLM uses ZeroMQ for inter-node communication, where the primary host opens an XPUB ZeroMQ socket bound to all network interfaces. This socket is intended for use only during tensor parallelism across multiple hosts. However, the socket remains open and accessible to any client with network access to the host unless explicitly blocked by a firewall. An attacker can connect to this XPUB socket without authentication and receive all broadcasted data sent to secondary vLLM hosts. Although the data consists of internal vLLM state information that is not directly useful for data exfiltration or manipulation, the exposure represents a potential information leakage risk. More critically, an attacker can exploit this open socket by establishing multiple connections and deliberately not reading the published data. This behavior can slow down or block the publisher, effectively causing a denial of service (DoS) on the primary vLLM host. The root cause is the allocation of resources without limits or throttling (CWE-770), allowing resource exhaustion through uncontrolled client connections. This vulnerability has been addressed in vLLM version 0.8.5, where presumably access controls or throttling mechanisms were introduced. The CVSS 3.1 base score is 7.5, indicating a high severity primarily due to the network attack vector, no required privileges or user interaction, and the impact being a denial of service on availability without confidentiality or integrity compromise. There are no known exploits in the wild at the time of publication.
Potential Impact
For European organizations deploying vLLM in multi-node configurations, this vulnerability poses a significant risk of service disruption. The denial of service can degrade or halt inference services critical for AI-driven applications, potentially impacting business operations relying on LLMs for natural language processing, automation, or customer interaction. While the data exposure is limited to internal state information and is unlikely to lead to direct data breaches, it could assist attackers in reconnaissance or crafting further attacks if combined with other vulnerabilities. Organizations in sectors such as technology, finance, healthcare, and research that utilize vLLM for AI workloads are particularly at risk. The disruption of AI services could lead to operational downtime, loss of productivity, and reputational damage. Additionally, if vLLM is integrated into critical infrastructure or services, the availability impact could have broader consequences. Given the network-based nature of the attack and the lack of authentication, attackers from within or outside the organization’s network perimeter could exploit this vulnerability if network controls are insufficient. The absence of user interaction and low attack complexity increase the likelihood of exploitation attempts, especially in environments with exposed or poorly segmented network architectures.
Mitigation Recommendations
1. Upgrade to vLLM version 0.8.5 or later immediately, as this version contains patches addressing this vulnerability. 2. Implement strict network segmentation and firewall rules to restrict access to the XPUB ZeroMQ socket port, allowing connections only from trusted secondary vLLM hosts. 3. Monitor network traffic to the vLLM primary host for unusual connection patterns, such as multiple simultaneous connections from unknown clients or clients that do not consume published data, which may indicate exploitation attempts. 4. Employ rate limiting or connection throttling at the network or application layer to prevent resource exhaustion from excessive client connections. 5. If upgrading immediately is not feasible, consider disabling multi-node deployments or the use of tensor parallelism features that require the XPUB socket until a patch can be applied. 6. Conduct regular security audits and penetration testing focused on inter-node communication channels to detect and remediate similar resource exhaustion vulnerabilities. 7. Maintain up-to-date inventory and configuration management of vLLM deployments to ensure vulnerable versions are identified and remediated promptly.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Ireland, Switzerland
CVE-2025-30202: CWE-770: Allocation of Resources Without Limits or Throttling in vllm-project vllm
Description
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts. Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker. By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher. This issue has been patched in version 0.8.5.
AI-Powered Analysis
Technical Analysis
CVE-2025-30202 is a vulnerability in the vLLM project, specifically affecting versions from 0.5.2 up to but not including 0.8.5. vLLM is a high-throughput, memory-efficient inference and serving engine designed for large language models (LLMs). In multi-node deployments, vLLM uses ZeroMQ for inter-node communication, where the primary host opens an XPUB ZeroMQ socket bound to all network interfaces. This socket is intended for use only during tensor parallelism across multiple hosts. However, the socket remains open and accessible to any client with network access to the host unless explicitly blocked by a firewall. An attacker can connect to this XPUB socket without authentication and receive all broadcasted data sent to secondary vLLM hosts. Although the data consists of internal vLLM state information that is not directly useful for data exfiltration or manipulation, the exposure represents a potential information leakage risk. More critically, an attacker can exploit this open socket by establishing multiple connections and deliberately not reading the published data. This behavior can slow down or block the publisher, effectively causing a denial of service (DoS) on the primary vLLM host. The root cause is the allocation of resources without limits or throttling (CWE-770), allowing resource exhaustion through uncontrolled client connections. This vulnerability has been addressed in vLLM version 0.8.5, where presumably access controls or throttling mechanisms were introduced. The CVSS 3.1 base score is 7.5, indicating a high severity primarily due to the network attack vector, no required privileges or user interaction, and the impact being a denial of service on availability without confidentiality or integrity compromise. There are no known exploits in the wild at the time of publication.
Potential Impact
For European organizations deploying vLLM in multi-node configurations, this vulnerability poses a significant risk of service disruption. The denial of service can degrade or halt inference services critical for AI-driven applications, potentially impacting business operations relying on LLMs for natural language processing, automation, or customer interaction. While the data exposure is limited to internal state information and is unlikely to lead to direct data breaches, it could assist attackers in reconnaissance or crafting further attacks if combined with other vulnerabilities. Organizations in sectors such as technology, finance, healthcare, and research that utilize vLLM for AI workloads are particularly at risk. The disruption of AI services could lead to operational downtime, loss of productivity, and reputational damage. Additionally, if vLLM is integrated into critical infrastructure or services, the availability impact could have broader consequences. Given the network-based nature of the attack and the lack of authentication, attackers from within or outside the organization’s network perimeter could exploit this vulnerability if network controls are insufficient. The absence of user interaction and low attack complexity increase the likelihood of exploitation attempts, especially in environments with exposed or poorly segmented network architectures.
Mitigation Recommendations
1. Upgrade to vLLM version 0.8.5 or later immediately, as this version contains patches addressing this vulnerability. 2. Implement strict network segmentation and firewall rules to restrict access to the XPUB ZeroMQ socket port, allowing connections only from trusted secondary vLLM hosts. 3. Monitor network traffic to the vLLM primary host for unusual connection patterns, such as multiple simultaneous connections from unknown clients or clients that do not consume published data, which may indicate exploitation attempts. 4. Employ rate limiting or connection throttling at the network or application layer to prevent resource exhaustion from excessive client connections. 5. If upgrading immediately is not feasible, consider disabling multi-node deployments or the use of tensor parallelism features that require the XPUB socket until a patch can be applied. 6. Conduct regular security audits and penetration testing focused on inter-node communication channels to detect and remediate similar resource exhaustion vulnerabilities. 7. Maintain up-to-date inventory and configuration management of vLLM deployments to ensure vulnerable versions are identified and remediated promptly.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-03-18T18:15:13.849Z
- Cisa Enriched
- true
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 682d983bc4522896dcbee2fa
Added to database: 5/21/2025, 9:09:15 AM
Last enriched: 6/25/2025, 5:51:29 AM
Last updated: 8/16/2025, 6:08:33 PM
Views: 15
Related Threats
CVE-2025-9091: Hard-coded Credentials in Tenda AC20
LowCVE-2025-9090: Command Injection in Tenda AC20
MediumCVE-2025-9092: CWE-400 Uncontrolled Resource Consumption in Legion of the Bouncy Castle Inc. Bouncy Castle for Java - BC-FJA 2.1.0
LowCVE-2025-9089: Stack-based Buffer Overflow in Tenda AC20
HighCVE-2025-9088: Stack-based Buffer Overflow in Tenda AC20
HighActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.