CVE-2025-32444: CWE-502: Deserialization of Untrusted Data in vllm-project vllm
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.
AI Analysis
Technical Summary
CVE-2025-32444 is a critical remote code execution (RCE) vulnerability affecting the vLLM project, specifically versions from 0.6.5 up to but not including 0.8.5, when integrated with the mooncake component. vLLM is a high-throughput, memory-efficient inference and serving engine designed for large language models (LLMs). The vulnerability arises from the use of Python's pickle serialization over unsecured ZeroMQ sockets that are configured to listen on all network interfaces. Pickle is inherently unsafe when deserializing untrusted data because it allows arbitrary code execution during the deserialization process. In this case, the ZeroMQ sockets expose the deserialization endpoint broadly on the network, increasing the attack surface and enabling remote attackers to send maliciously crafted pickle payloads. This can lead to full system compromise without requiring authentication or user interaction. The vulnerability is classified under CWE-502 (Deserialization of Untrusted Data). The issue has been patched in vLLM version 0.8.5, which presumably replaces or secures the serialization mechanism and/or restricts socket exposure. Notably, vLLM instances that do not use the mooncake integration are not vulnerable, indicating the flaw is specific to that integration layer. Although no known exploits are currently reported in the wild, the CVSS v3.1 base score is 10.0, reflecting the highest severity due to network attack vector, no required privileges or user interaction, and complete impact on confidentiality, integrity, and availability with a scope change. This vulnerability poses a significant risk to any organization deploying vulnerable versions of vLLM with mooncake integration, especially in environments where the ZeroMQ sockets are exposed to untrusted networks or the internet.
Potential Impact
For European organizations, the impact of this vulnerability can be severe, particularly for entities leveraging vLLM for AI inference workloads in production or research environments. Successful exploitation would allow attackers to execute arbitrary code remotely, potentially leading to full system takeover, data theft, manipulation of AI model outputs, or disruption of AI services. This could compromise sensitive intellectual property, customer data, or critical AI-driven business processes. Given the criticality and ease of exploitation, organizations in sectors such as finance, healthcare, telecommunications, and government—where AI inference engines may be integrated into operational workflows—face heightened risks. Additionally, the exposure of ZeroMQ sockets on all network interfaces increases the likelihood of attacks originating from both internal and external threat actors. The vulnerability could also be leveraged as a foothold for lateral movement within networks or to deploy ransomware or other malware payloads. The lack of required authentication and user interaction further exacerbates the threat, making automated exploitation feasible. The impact extends beyond confidentiality and integrity to availability, as compromised systems may be taken offline or manipulated to produce incorrect AI outputs, undermining trust in AI services.
Mitigation Recommendations
To mitigate this vulnerability, European organizations should: 1) Immediately upgrade all affected vLLM instances with mooncake integration to version 0.8.5 or later, where the vulnerability is patched. 2) If upgrading is not immediately feasible, restrict network exposure of ZeroMQ sockets by configuring them to bind only to localhost or trusted internal interfaces, preventing remote access from untrusted networks. 3) Implement network-level controls such as firewall rules or segmentation to limit access to the ZeroMQ ports only to authorized systems. 4) Audit existing deployments to identify any instances running vulnerable versions with mooncake integration, using software inventory and network scanning tools. 5) Consider disabling or removing the mooncake integration if it is not essential to operations, thereby eliminating the attack vector. 6) Monitor network traffic for unusual or unexpected ZeroMQ communication patterns that could indicate exploitation attempts. 7) Employ runtime application self-protection (RASP) or endpoint detection and response (EDR) solutions to detect anomalous process behavior indicative of code execution attacks. 8) Educate development and operations teams on the risks of insecure deserialization and the importance of secure serialization methods, especially when exposing services over the network. These steps go beyond generic patching advice by emphasizing network exposure controls and operational monitoring tailored to the nature of the vulnerability.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Belgium, Italy, Spain
CVE-2025-32444: CWE-502: Deserialization of Untrusted Data in vllm-project vllm
Description
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.
AI-Powered Analysis
Technical Analysis
CVE-2025-32444 is a critical remote code execution (RCE) vulnerability affecting the vLLM project, specifically versions from 0.6.5 up to but not including 0.8.5, when integrated with the mooncake component. vLLM is a high-throughput, memory-efficient inference and serving engine designed for large language models (LLMs). The vulnerability arises from the use of Python's pickle serialization over unsecured ZeroMQ sockets that are configured to listen on all network interfaces. Pickle is inherently unsafe when deserializing untrusted data because it allows arbitrary code execution during the deserialization process. In this case, the ZeroMQ sockets expose the deserialization endpoint broadly on the network, increasing the attack surface and enabling remote attackers to send maliciously crafted pickle payloads. This can lead to full system compromise without requiring authentication or user interaction. The vulnerability is classified under CWE-502 (Deserialization of Untrusted Data). The issue has been patched in vLLM version 0.8.5, which presumably replaces or secures the serialization mechanism and/or restricts socket exposure. Notably, vLLM instances that do not use the mooncake integration are not vulnerable, indicating the flaw is specific to that integration layer. Although no known exploits are currently reported in the wild, the CVSS v3.1 base score is 10.0, reflecting the highest severity due to network attack vector, no required privileges or user interaction, and complete impact on confidentiality, integrity, and availability with a scope change. This vulnerability poses a significant risk to any organization deploying vulnerable versions of vLLM with mooncake integration, especially in environments where the ZeroMQ sockets are exposed to untrusted networks or the internet.
Potential Impact
For European organizations, the impact of this vulnerability can be severe, particularly for entities leveraging vLLM for AI inference workloads in production or research environments. Successful exploitation would allow attackers to execute arbitrary code remotely, potentially leading to full system takeover, data theft, manipulation of AI model outputs, or disruption of AI services. This could compromise sensitive intellectual property, customer data, or critical AI-driven business processes. Given the criticality and ease of exploitation, organizations in sectors such as finance, healthcare, telecommunications, and government—where AI inference engines may be integrated into operational workflows—face heightened risks. Additionally, the exposure of ZeroMQ sockets on all network interfaces increases the likelihood of attacks originating from both internal and external threat actors. The vulnerability could also be leveraged as a foothold for lateral movement within networks or to deploy ransomware or other malware payloads. The lack of required authentication and user interaction further exacerbates the threat, making automated exploitation feasible. The impact extends beyond confidentiality and integrity to availability, as compromised systems may be taken offline or manipulated to produce incorrect AI outputs, undermining trust in AI services.
Mitigation Recommendations
To mitigate this vulnerability, European organizations should: 1) Immediately upgrade all affected vLLM instances with mooncake integration to version 0.8.5 or later, where the vulnerability is patched. 2) If upgrading is not immediately feasible, restrict network exposure of ZeroMQ sockets by configuring them to bind only to localhost or trusted internal interfaces, preventing remote access from untrusted networks. 3) Implement network-level controls such as firewall rules or segmentation to limit access to the ZeroMQ ports only to authorized systems. 4) Audit existing deployments to identify any instances running vulnerable versions with mooncake integration, using software inventory and network scanning tools. 5) Consider disabling or removing the mooncake integration if it is not essential to operations, thereby eliminating the attack vector. 6) Monitor network traffic for unusual or unexpected ZeroMQ communication patterns that could indicate exploitation attempts. 7) Employ runtime application self-protection (RASP) or endpoint detection and response (EDR) solutions to detect anomalous process behavior indicative of code execution attacks. 8) Educate development and operations teams on the risks of insecure deserialization and the importance of secure serialization methods, especially when exposing services over the network. These steps go beyond generic patching advice by emphasizing network exposure controls and operational monitoring tailored to the nature of the vulnerability.
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-04-08T10:54:58.369Z
- Cisa Enriched
- true
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 682d983bc4522896dcbee2fc
Added to database: 5/21/2025, 9:09:15 AM
Last enriched: 6/25/2025, 5:51:12 AM
Last updated: 8/17/2025, 11:14:34 AM
Views: 14
Related Threats
CVE-2025-53948: CWE-415 Double Free in Santesoft Sante PACS Server
HighCVE-2025-52584: CWE-122 Heap-based Buffer Overflow in Ashlar-Vellum Cobalt
HighCVE-2025-46269: CWE-122 Heap-based Buffer Overflow in Ashlar-Vellum Cobalt
HighCVE-2025-54862: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in Santesoft Sante PACS Server
MediumCVE-2025-54759: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in Santesoft Sante PACS Server
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.