Skip to main content

CVE-2025-49847: CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer in ggml-org llama.cpp

High
VulnerabilityCVE-2025-49847cvecve-2025-49847cwe-119cwe-195
Published: Tue Jun 17 2025 (06/17/2025, 20:04:40 UTC)
Source: CVE Database V5
Vendor/Project: ggml-org
Product: llama.cpp

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to version b5662, an attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution. This issue has been patched in version b5662.

AI-Powered Analysis

AILast updated: 06/17/2025, 20:34:33 UTC

Technical Analysis

CVE-2025-49847 is a high-severity vulnerability affecting versions of the open-source project llama.cpp prior to b5662. llama.cpp is a C/C++ implementation used for inference of large language models (LLMs), relying on GGUF model vocabularies. The vulnerability arises from improper bounds checking in the vocabulary-loading code, specifically within the helper function _try_copy located in llama.cpp/src/vocab.cpp in the method llama_vocab::impl::token_to_piece(). The core issue is that a token length, originally a size_t type, is cast to a signed 32-bit integer (int32_t) without adequate validation. This cast allows an attacker-supplied GGUF model vocabulary with an excessively large token length to bypass the length check (if (length < (int32_t)size)) because the casted length may wrap or become negative, causing the condition to incorrectly evaluate as true. Consequently, memcpy is called with this oversized length, leading to a buffer overflow. This overflow can overwrite memory beyond the intended buffer, resulting in arbitrary memory corruption. The implications include potential arbitrary code execution, compromising confidentiality, integrity, and availability of the host system. The vulnerability requires no privileges (PR:N) but does require user interaction (UI:R), meaning an attacker must convince a user to load a malicious GGUF model. The attack vector is network-based (AV:N), and the scope is unchanged (S:U). The vulnerability has a CVSS 3.1 base score of 8.8, reflecting high impact on confidentiality, integrity, and availability. No known exploits are currently reported in the wild, and the issue has been patched in version b5662 of llama.cpp. The vulnerability is categorized under CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer) and CWE-195 (Signed to Unsigned Conversion Error).

Potential Impact

European organizations utilizing llama.cpp for LLM inference, particularly those integrating GGUF model vocabularies from external or untrusted sources, face significant risks. Exploitation can lead to arbitrary code execution, enabling attackers to execute malicious payloads, escalate privileges, or disrupt services. This can compromise sensitive data processed by LLMs, including intellectual property, personal data, or confidential communications, violating GDPR and other data protection regulations. The vulnerability's network attack vector and requirement for user interaction mean phishing or social engineering could be used to deliver malicious models. Sectors heavily relying on AI and LLMs, such as finance, healthcare, research institutions, and technology companies, are particularly vulnerable. Disruption or compromise of AI inference infrastructure could lead to operational downtime, reputational damage, and regulatory penalties. Given the growing adoption of LLMs in Europe, the threat surface is expanding, especially in organizations deploying custom or third-party GGUF models without strict validation.

Mitigation Recommendations

1. Immediate upgrade to llama.cpp version b5662 or later, where the vulnerability is patched. 2. Implement strict validation and sanitization of all GGUF model vocabularies before loading, including verifying token lengths and rejecting suspiciously large or malformed tokens. 3. Restrict model loading to trusted sources only; avoid loading models from unverified third parties or unknown origins. 4. Employ runtime memory protection mechanisms such as Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP) to mitigate exploitation impact. 5. Monitor and log all model loading activities to detect anomalous or unexpected model files. 6. Educate users on the risks of loading untrusted models and enforce policies to prevent unauthorized model usage. 7. For organizations embedding llama.cpp in larger systems, conduct thorough code audits and fuzz testing focused on vocabulary processing components. 8. Consider sandboxing the inference environment to contain potential exploitation effects. 9. Maintain up-to-date threat intelligence feeds to detect emerging exploitation attempts.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
GitHub_M
Date Reserved
2025-06-11T14:33:57.800Z
Cvss Version
3.1
State
PUBLISHED

Threat ID: 6851cdd1a8c921274386250d

Added to database: 6/17/2025, 8:19:29 PM

Last enriched: 6/17/2025, 8:34:33 PM

Last updated: 8/2/2025, 6:13:48 PM

Views: 23

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats