CVE-2026-33298: CWE-122: Heap-based Buffer Overflow in ggml-org llama.cpp
llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.
AI Analysis
Technical Summary
The vulnerability identified as CVE-2026-33298 affects the llama.cpp project by ggml-org, which is a C/C++ implementation for inference of large language models (LLMs). The root cause is an integer overflow in the function ggml_nbytes, responsible for calculating the number of bytes needed to store tensor data. When processing a GGUF file with maliciously crafted tensor dimensions, ggml_nbytes returns a significantly smaller size than actually required—e.g., reporting 4MB instead of exabytes—due to the integer overflow (CWE-190). This miscalculation causes the application to allocate insufficient heap memory. Subsequent operations that write tensor data overflow the allocated buffer (CWE-122), corrupting adjacent memory. This memory corruption can be exploited to achieve remote code execution (RCE), compromising the system running the vulnerable llama.cpp version. The vulnerability requires local access and user interaction (loading a crafted GGUF file) but does not require privileges. The flaw was patched in commit b7824, which corrects the integer overflow and enforces proper memory validation. No known exploits have been reported in the wild yet, but the severity score of 7.8 (CVSS 3.1) reflects the high impact on confidentiality, integrity, and availability if exploited. The vulnerability affects all versions prior to b7824 and is relevant to any deployment using llama.cpp for LLM inference, especially where untrusted GGUF files might be processed.
Potential Impact
The impact of CVE-2026-33298 is significant for organizations using llama.cpp for large language model inference, particularly in environments where untrusted or user-supplied GGUF files are processed. Successful exploitation can lead to remote code execution, allowing attackers to execute arbitrary code with the privileges of the application, potentially leading to full system compromise. This threatens confidentiality by exposing sensitive data processed by the model, integrity by enabling unauthorized code or data manipulation, and availability by causing crashes or denial of service. Since llama.cpp is used in AI/ML workloads, exploitation could disrupt critical AI services, degrade trust in AI outputs, or be leveraged as a foothold for lateral movement within networks. The requirement for local access and user interaction limits remote exploitation but does not eliminate risk in multi-user or cloud environments where users can upload or influence GGUF files. The absence of known exploits in the wild suggests a window for proactive patching. However, the growing adoption of LLMs and AI inference frameworks globally increases the potential attack surface.
Mitigation Recommendations
To mitigate CVE-2026-33298, organizations should immediately update llama.cpp to version b7824 or later, which contains the fix for the integer overflow and buffer overflow. Additionally, implement strict validation and sanitization of GGUF files before processing, including checks on tensor dimensions and file integrity to prevent malformed inputs. Employ runtime memory protection mechanisms such as Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP) to reduce exploitation success. Restrict access to systems running llama.cpp to trusted users and environments, minimizing the risk of malicious file uploads or execution. Monitor logs and system behavior for anomalies indicative of memory corruption or exploitation attempts. Where possible, sandbox the inference environment to contain potential compromises. Educate developers and operators about the risks of processing untrusted model files and enforce secure coding practices in AI/ML pipelines. Finally, maintain an incident response plan tailored to AI infrastructure breaches.
Affected Countries
United States, China, Germany, United Kingdom, Canada, France, Japan, South Korea, India, Australia
CVE-2026-33298: CWE-122: Heap-based Buffer Overflow in ggml-org llama.cpp
Description
llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
The vulnerability identified as CVE-2026-33298 affects the llama.cpp project by ggml-org, which is a C/C++ implementation for inference of large language models (LLMs). The root cause is an integer overflow in the function ggml_nbytes, responsible for calculating the number of bytes needed to store tensor data. When processing a GGUF file with maliciously crafted tensor dimensions, ggml_nbytes returns a significantly smaller size than actually required—e.g., reporting 4MB instead of exabytes—due to the integer overflow (CWE-190). This miscalculation causes the application to allocate insufficient heap memory. Subsequent operations that write tensor data overflow the allocated buffer (CWE-122), corrupting adjacent memory. This memory corruption can be exploited to achieve remote code execution (RCE), compromising the system running the vulnerable llama.cpp version. The vulnerability requires local access and user interaction (loading a crafted GGUF file) but does not require privileges. The flaw was patched in commit b7824, which corrects the integer overflow and enforces proper memory validation. No known exploits have been reported in the wild yet, but the severity score of 7.8 (CVSS 3.1) reflects the high impact on confidentiality, integrity, and availability if exploited. The vulnerability affects all versions prior to b7824 and is relevant to any deployment using llama.cpp for LLM inference, especially where untrusted GGUF files might be processed.
Potential Impact
The impact of CVE-2026-33298 is significant for organizations using llama.cpp for large language model inference, particularly in environments where untrusted or user-supplied GGUF files are processed. Successful exploitation can lead to remote code execution, allowing attackers to execute arbitrary code with the privileges of the application, potentially leading to full system compromise. This threatens confidentiality by exposing sensitive data processed by the model, integrity by enabling unauthorized code or data manipulation, and availability by causing crashes or denial of service. Since llama.cpp is used in AI/ML workloads, exploitation could disrupt critical AI services, degrade trust in AI outputs, or be leveraged as a foothold for lateral movement within networks. The requirement for local access and user interaction limits remote exploitation but does not eliminate risk in multi-user or cloud environments where users can upload or influence GGUF files. The absence of known exploits in the wild suggests a window for proactive patching. However, the growing adoption of LLMs and AI inference frameworks globally increases the potential attack surface.
Mitigation Recommendations
To mitigate CVE-2026-33298, organizations should immediately update llama.cpp to version b7824 or later, which contains the fix for the integer overflow and buffer overflow. Additionally, implement strict validation and sanitization of GGUF files before processing, including checks on tensor dimensions and file integrity to prevent malformed inputs. Employ runtime memory protection mechanisms such as Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP) to reduce exploitation success. Restrict access to systems running llama.cpp to trusted users and environments, minimizing the risk of malicious file uploads or execution. Monitor logs and system behavior for anomalies indicative of memory corruption or exploitation attempts. Where possible, sandbox the inference environment to contain potential compromises. Educate developers and operators about the risks of processing untrusted model files and enforce secure coding practices in AI/ML pipelines. Finally, maintain an incident response plan tailored to AI infrastructure breaches.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2026-03-18T18:55:47.427Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 69c1debff4197a8e3babf86e
Added to database: 3/24/2026, 12:45:51 AM
Last enriched: 3/24/2026, 1:00:57 AM
Last updated: 3/24/2026, 4:41:40 AM
Views: 5
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.