Skip to main content

CVE-2025-52566: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer in ggml-org llama.cpp

High
VulnerabilityCVE-2025-52566cvecve-2025-52566cwe-119cwe-195
Published: Tue Jun 24 2025 (06/24/2025, 03:21:19 UTC)
Source: CVE Database V5
Vendor/Project: ggml-org
Product: llama.cpp

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to version b5721, there is a signed vs. unsigned integer overflow in llama.cpp's tokenizer implementation (llama_vocab::tokenize) (src/llama-vocab.cpp:3036) resulting in unintended behavior in tokens copying size comparison. Allowing heap-overflowing llama.cpp inferencing engine with carefully manipulated text input during tokenization process. This issue has been patched in version b5721.

AI-Powered Analysis

AILast updated: 06/24/2025, 03:54:59 UTC

Technical Analysis

CVE-2025-52566 is a high-severity vulnerability affecting versions of the open-source inference engine llama.cpp prior to version b5721. llama.cpp is a C/C++ implementation used to run inference on several large language models (LLMs). The vulnerability arises from an improper restriction of operations within the bounds of a memory buffer, specifically a signed versus unsigned integer overflow in the tokenizer implementation (llama_vocab::tokenize) at source code line src/llama-vocab.cpp:3036. This flaw causes incorrect size comparisons during token copying, leading to a heap overflow condition when processing carefully crafted text inputs. The heap overflow can corrupt memory, potentially allowing an attacker to execute arbitrary code, cause denial of service, or compromise confidentiality and integrity of the system running the vulnerable llama.cpp version. Exploitation requires local access (AV:L) and user interaction (UI:R), but no privileges are required (PR:N). The vulnerability has a CVSS 3.1 base score of 8.6, reflecting its high impact on confidentiality, integrity, and availability, and the scope is changed (S:C), meaning the vulnerability can affect components beyond the vulnerable llama.cpp process itself. Although no known exploits are currently reported in the wild, the nature of the vulnerability and the increasing use of llama.cpp for LLM inference make it a significant risk. The issue has been patched in version b5721, and users are strongly advised to upgrade. The vulnerability is classified under CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer) and CWE-195 (Signed to Unsigned Conversion Error), both common causes of memory corruption bugs in C/C++ software.

Potential Impact

For European organizations, the impact of this vulnerability can be substantial, especially for those integrating llama.cpp into AI-driven applications, research platforms, or commercial products involving large language model inference. Exploitation could lead to remote code execution or denial of service on systems running vulnerable versions, potentially disrupting critical AI services or exposing sensitive data processed by these models. Given the growing adoption of LLMs in sectors such as finance, healthcare, telecommunications, and government, a successful attack could compromise intellectual property, customer data, or operational continuity. The vulnerability’s requirement for local access and user interaction somewhat limits remote exploitation, but insider threats or compromised endpoints could be leveraged to trigger the flaw. Additionally, the changed scope indicates that exploitation may affect other system components beyond the llama.cpp process, increasing the risk of broader system compromise. Organizations relying on AI inference engines for decision-making or customer-facing services may face reputational damage and regulatory consequences under GDPR if personal data is exposed or systems are disrupted.

Mitigation Recommendations

1. Immediate upgrade to llama.cpp version b5721 or later where the vulnerability is patched. 2. Implement strict input validation and sanitization on all text inputs fed into the tokenizer to reduce the risk of maliciously crafted inputs triggering the overflow. 3. Employ runtime memory protection mechanisms such as AddressSanitizer during development and testing to detect similar memory issues early. 4. Restrict access to systems running llama.cpp inference engines to trusted users only and monitor for unusual user activity that could indicate exploitation attempts. 5. Use containerization or sandboxing to isolate the llama.cpp process, limiting the impact of potential exploitation on the host system. 6. Regularly audit and update all AI-related dependencies and libraries to ensure vulnerabilities are promptly addressed. 7. Incorporate anomaly detection on AI service outputs and system behavior to detect signs of compromise or instability caused by exploitation attempts. 8. Educate developers and operators about secure coding practices related to memory management in C/C++ to prevent similar vulnerabilities in custom extensions or integrations.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
GitHub_M
Date Reserved
2025-06-18T03:55:52.036Z
Cvss Version
3.1
State
PUBLISHED

Threat ID: 685a1dfadec26fc862d8f682

Added to database: 6/24/2025, 3:39:38 AM

Last enriched: 6/24/2025, 3:54:59 AM

Last updated: 8/13/2025, 9:45:09 AM

Views: 14

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats