CVE-2025-30165: CWE-502: Deserialization of Untrusted Data in vllm-project vllm

Severity: highType: vulnerabilityCVE-2025-30165

vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary vLLM host. When data is received on this `SUB` socket, it is deserialized with `pickle`. This is unsafe, as it can be abused to execute code on a remote machine. Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment. Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine. Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern. Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, the maintainers of vLLM have decided not to fix this issue. Instead, the maintainers recommend that users ensure their environment is on a secure network in case this pattern is in use. The V1 engine is not affected by this issue.

AI Analysis

Technical Summary

CVE-2025-30165 is a high-severity vulnerability affecting the vLLM project, specifically versions from 0.5.2 up to 0.8.5.post1, involving the V0 engine used for multi-node deployments with tensor parallelism. vLLM is an inference and serving engine for large language models, and in multi-node setups using the V0 engine, it employs ZeroMQ sockets for inter-node communication. The vulnerability arises because the secondary vLLM hosts open a SUB ZeroMQ socket that connects to an XPUB socket on the primary host, and the data received on this SUB socket is deserialized using Python's pickle module. Pickle deserialization of untrusted data is inherently unsafe, as it can lead to arbitrary code execution. This vulnerability can be exploited by an attacker who can send maliciously crafted serialized data to the SUB socket, potentially allowing remote code execution on secondary hosts. The vulnerability serves as an escalation point within the vLLM deployment: if the primary host is compromised, the attacker can leverage this flaw to compromise secondary hosts. Moreover, exploitation does not strictly require prior access to the primary host; network-based attacks such as ARP cache poisoning could redirect traffic to a malicious endpoint that delivers a payload to execute arbitrary code on the target machine. The vulnerability is limited to the V0 engine, which has been disabled by default since version 0.8.0, and only applies to multi-host tensor parallelism deployments—a relatively uncommon configuration. The maintainers have chosen not to patch this vulnerability due to the invasive nature of the fix and the low expected usage of the vulnerable configuration. Instead, they recommend operating the vulnerable pattern only within secure, trusted networks. The V1 engine is not affected. The CVSS v3.1 score is 8.0 (high), reflecting the vulnerability's network attack vector requiring low complexity, low privileges, no user interaction, and resulting in high confidentiality, integrity, and availability impacts.

Potential Impact

For European organizations deploying vLLM in multi-node configurations using the V0 engine, this vulnerability poses a significant risk of remote code execution leading to full compromise of secondary hosts. Given that vLLM is used for serving large language models, exploitation could result in unauthorized access to sensitive AI workloads, data leakage, manipulation of inference results, or disruption of AI services. The ability to escalate from a compromised primary host to secondary hosts could facilitate lateral movement within an organization's AI infrastructure. Additionally, network-based exploitation methods like ARP cache poisoning increase the attack surface, especially in less segmented or poorly secured internal networks. This could impact organizations relying on AI inference services for critical applications such as finance, healthcare, or government services, where confidentiality and integrity of AI outputs are paramount. The lack of a patch and the recommendation to rely on secure network environments place the onus on organizations to enforce strict network segmentation and monitoring, which may be challenging in complex or hybrid cloud environments common in Europe. However, since the vulnerable configuration is uncommon and the V0 engine is disabled by default, the overall exposure is somewhat limited but still critical where present.

Mitigation Recommendations

European organizations should first verify whether they are using the vLLM V0 engine in multi-node tensor parallelism deployments. If so, they should consider migrating to the V1 engine, which is not affected by this vulnerability. If migration is not immediately feasible, organizations must ensure that the vLLM deployment operates within a strictly controlled and segmented network environment, isolating the multi-node communication channels from untrusted networks and users. Implementing network-level protections such as VLAN segmentation, strict firewall rules, and intrusion detection/prevention systems to monitor ZeroMQ traffic can reduce exposure. Additionally, organizations should employ network security measures to prevent ARP cache poisoning attacks, including static ARP entries where practical, dynamic ARP inspection on switches, and monitoring for anomalous ARP traffic. Regular network traffic analysis and anomaly detection can help identify attempts to exploit this vulnerability. Since no patch is available, organizations should also consider compensating controls such as application-layer authentication or encryption for ZeroMQ communications if possible, to prevent unauthorized data injection. Finally, maintaining up-to-date asset inventories and conducting security assessments focused on AI infrastructure will help identify and remediate vulnerable deployments.