CVE-2025-52566: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer in ggml-org llama.cpp
llama.cpp is an inference of several LLM models in C/C++. Prior to version b5721, there is a signed vs. unsigned integer overflow in llama.cpp's tokenizer implementation (llama_vocab::tokenize) (src/llama-vocab.cpp:3036) resulting in unintended behavior in tokens copying size comparison. Allowing heap-overflowing llama.cpp inferencing engine with carefully manipulated text input during tokenization process. This issue has been patched in version b5721.
AI Analysis
Technical Summary
CVE-2025-52566 is a high-severity vulnerability affecting versions of the open-source inference engine llama.cpp prior to version b5721. llama.cpp is a C/C++ implementation used to run inference on several large language models (LLMs). The vulnerability arises from an improper restriction of operations within the bounds of a memory buffer, specifically a signed versus unsigned integer overflow in the tokenizer implementation (llama_vocab::tokenize) at source code line src/llama-vocab.cpp:3036. This flaw causes incorrect size comparisons during token copying, leading to a heap overflow condition when processing carefully crafted text inputs. The heap overflow can corrupt memory, potentially allowing an attacker to execute arbitrary code, cause denial of service, or compromise confidentiality and integrity of the system running the vulnerable llama.cpp version. Exploitation requires local access (AV:L) and user interaction (UI:R), but no privileges are required (PR:N). The vulnerability has a CVSS 3.1 base score of 8.6, reflecting its high impact on confidentiality, integrity, and availability, and the scope is changed (S:C), meaning the vulnerability can affect components beyond the vulnerable llama.cpp process itself. Although no known exploits are currently reported in the wild, the nature of the vulnerability and the increasing use of llama.cpp for LLM inference make it a significant risk. The issue has been patched in version b5721, and users are strongly advised to upgrade. The vulnerability is classified under CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer) and CWE-195 (Signed to Unsigned Conversion Error), both common causes of memory corruption bugs in C/C++ software.
Potential Impact
For European organizations, the impact of this vulnerability can be substantial, especially for those integrating llama.cpp into AI-driven applications, research platforms, or commercial products involving large language model inference. Exploitation could lead to remote code execution or denial of service on systems running vulnerable versions, potentially disrupting critical AI services or exposing sensitive data processed by these models. Given the growing adoption of LLMs in sectors such as finance, healthcare, telecommunications, and government, a successful attack could compromise intellectual property, customer data, or operational continuity. The vulnerability’s requirement for local access and user interaction somewhat limits remote exploitation, but insider threats or compromised endpoints could be leveraged to trigger the flaw. Additionally, the changed scope indicates that exploitation may affect other system components beyond the llama.cpp process, increasing the risk of broader system compromise. Organizations relying on AI inference engines for decision-making or customer-facing services may face reputational damage and regulatory consequences under GDPR if personal data is exposed or systems are disrupted.
Mitigation Recommendations
1. Immediate upgrade to llama.cpp version b5721 or later where the vulnerability is patched. 2. Implement strict input validation and sanitization on all text inputs fed into the tokenizer to reduce the risk of maliciously crafted inputs triggering the overflow. 3. Employ runtime memory protection mechanisms such as AddressSanitizer during development and testing to detect similar memory issues early. 4. Restrict access to systems running llama.cpp inference engines to trusted users only and monitor for unusual user activity that could indicate exploitation attempts. 5. Use containerization or sandboxing to isolate the llama.cpp process, limiting the impact of potential exploitation on the host system. 6. Regularly audit and update all AI-related dependencies and libraries to ensure vulnerabilities are promptly addressed. 7. Incorporate anomaly detection on AI service outputs and system behavior to detect signs of compromise or instability caused by exploitation attempts. 8. Educate developers and operators about secure coding practices related to memory management in C/C++ to prevent similar vulnerabilities in custom extensions or integrations.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Ireland, Belgium
CVE-2025-52566: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer in ggml-org llama.cpp
Description
llama.cpp is an inference of several LLM models in C/C++. Prior to version b5721, there is a signed vs. unsigned integer overflow in llama.cpp's tokenizer implementation (llama_vocab::tokenize) (src/llama-vocab.cpp:3036) resulting in unintended behavior in tokens copying size comparison. Allowing heap-overflowing llama.cpp inferencing engine with carefully manipulated text input during tokenization process. This issue has been patched in version b5721.
AI-Powered Analysis
Technical Analysis
CVE-2025-52566 is a high-severity vulnerability affecting versions of the open-source inference engine llama.cpp prior to version b5721. llama.cpp is a C/C++ implementation used to run inference on several large language models (LLMs). The vulnerability arises from an improper restriction of operations within the bounds of a memory buffer, specifically a signed versus unsigned integer overflow in the tokenizer implementation (llama_vocab::tokenize) at source code line src/llama-vocab.cpp:3036. This flaw causes incorrect size comparisons during token copying, leading to a heap overflow condition when processing carefully crafted text inputs. The heap overflow can corrupt memory, potentially allowing an attacker to execute arbitrary code, cause denial of service, or compromise confidentiality and integrity of the system running the vulnerable llama.cpp version. Exploitation requires local access (AV:L) and user interaction (UI:R), but no privileges are required (PR:N). The vulnerability has a CVSS 3.1 base score of 8.6, reflecting its high impact on confidentiality, integrity, and availability, and the scope is changed (S:C), meaning the vulnerability can affect components beyond the vulnerable llama.cpp process itself. Although no known exploits are currently reported in the wild, the nature of the vulnerability and the increasing use of llama.cpp for LLM inference make it a significant risk. The issue has been patched in version b5721, and users are strongly advised to upgrade. The vulnerability is classified under CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer) and CWE-195 (Signed to Unsigned Conversion Error), both common causes of memory corruption bugs in C/C++ software.
Potential Impact
For European organizations, the impact of this vulnerability can be substantial, especially for those integrating llama.cpp into AI-driven applications, research platforms, or commercial products involving large language model inference. Exploitation could lead to remote code execution or denial of service on systems running vulnerable versions, potentially disrupting critical AI services or exposing sensitive data processed by these models. Given the growing adoption of LLMs in sectors such as finance, healthcare, telecommunications, and government, a successful attack could compromise intellectual property, customer data, or operational continuity. The vulnerability’s requirement for local access and user interaction somewhat limits remote exploitation, but insider threats or compromised endpoints could be leveraged to trigger the flaw. Additionally, the changed scope indicates that exploitation may affect other system components beyond the llama.cpp process, increasing the risk of broader system compromise. Organizations relying on AI inference engines for decision-making or customer-facing services may face reputational damage and regulatory consequences under GDPR if personal data is exposed or systems are disrupted.
Mitigation Recommendations
1. Immediate upgrade to llama.cpp version b5721 or later where the vulnerability is patched. 2. Implement strict input validation and sanitization on all text inputs fed into the tokenizer to reduce the risk of maliciously crafted inputs triggering the overflow. 3. Employ runtime memory protection mechanisms such as AddressSanitizer during development and testing to detect similar memory issues early. 4. Restrict access to systems running llama.cpp inference engines to trusted users only and monitor for unusual user activity that could indicate exploitation attempts. 5. Use containerization or sandboxing to isolate the llama.cpp process, limiting the impact of potential exploitation on the host system. 6. Regularly audit and update all AI-related dependencies and libraries to ensure vulnerabilities are promptly addressed. 7. Incorporate anomaly detection on AI service outputs and system behavior to detect signs of compromise or instability caused by exploitation attempts. 8. Educate developers and operators about secure coding practices related to memory management in C/C++ to prevent similar vulnerabilities in custom extensions or integrations.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2025-06-18T03:55:52.036Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 685a1dfadec26fc862d8f682
Added to database: 6/24/2025, 3:39:38 AM
Last enriched: 6/24/2025, 3:54:59 AM
Last updated: 11/22/2025, 7:25:46 AM
Views: 95
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-11186: CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in humanityco Cookie Notice & Compliance for GDPR / CCPA
MediumCVE-2025-2609: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in MagnusSolution MagnusBilling
HighCVE-2024-9643: CWE-489 Active Debug Code in Four-Faith F3x36
CriticalCVE-2025-65947: CWE-400: Uncontrolled Resource Consumption in jzeuzs thread-amount
HighCVE-2025-65946: CWE-77: Improper Neutralization of Special Elements used in a Command ('Command Injection') in RooCodeInc Roo-Code
HighActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.