CVE-2026-21869: CWE-787: Out-of-bounds Write in ggml-org llama.cpp
llama.cpp is an inference of several LLM models in C/C++. In commits 55d4206c8 and prior, the n_discard parameter is parsed directly from JSON input in the llama.cpp server's completion endpoints without validation to ensure it's non-negative. When a negative value is supplied and the context fills up, llama_memory_seq_rm/add receives a reversed range and negative offset, causing out-of-bounds memory writes in the token evaluation loop. This deterministic memory corruption can crash the process or enable remote code execution (RCE). There is no fix at the time of publication.
AI Analysis
Technical Summary
CVE-2026-21869 is a critical memory corruption vulnerability classified under CWE-787 (Out-of-bounds Write) in the llama.cpp project, which is a C/C++ implementation for inference of large language models (LLMs). The vulnerability exists because the n_discard parameter, which controls token discarding during inference, is parsed directly from JSON input without validation to ensure it is non-negative. When a negative n_discard value is supplied and the context buffer fills up, the llama_memory_seq_rm/add functions receive reversed ranges and negative offsets, causing out-of-bounds writes in the token evaluation loop. This deterministic memory corruption can lead to process crashes or potentially allow remote code execution (RCE) by an attacker who can send crafted JSON requests to the server's completion endpoints. The attack vector is remote and requires no prior authentication but does require user interaction in the form of sending malicious input. The vulnerability affects all versions of llama.cpp up to commit 55d4206c8, and no patch or fix is available as of the publication date (January 7, 2026). Given the nature of the vulnerability, exploitation could compromise confidentiality, integrity, and availability of the affected systems, allowing attackers to execute arbitrary code remotely. The lack of input validation is a critical oversight in the server's JSON parsing logic, and the vulnerability is particularly dangerous in environments where llama.cpp is exposed to untrusted users or integrated into larger AI service infrastructures.
Potential Impact
For European organizations, the impact of CVE-2026-21869 can be severe, especially for those deploying llama.cpp-based LLM inference servers in production, research, or cloud environments. Exploitation could lead to remote code execution, allowing attackers to gain unauthorized control over inference servers, potentially leading to data breaches, manipulation of AI model outputs, or disruption of AI services. Confidentiality is at risk if sensitive data processed by the LLM is exposed or exfiltrated. Integrity can be compromised if attackers alter inference results or inject malicious payloads. Availability is threatened due to possible crashes caused by memory corruption. Organizations relying on AI-driven decision-making or customer-facing AI services could suffer reputational damage and operational downtime. The absence of a patch increases the urgency for immediate mitigations. Additionally, regulatory compliance under GDPR may be impacted if personal data is compromised through exploitation of this vulnerability.
Mitigation Recommendations
Since no official patch is available, European organizations should implement immediate mitigations to reduce risk. First, restrict network access to llama.cpp inference endpoints to trusted internal users only, using network segmentation and firewall rules. Second, implement input validation proxies or web application firewalls (WAFs) that enforce non-negative constraints on the n_discard parameter before requests reach the server. Third, monitor logs and network traffic for anomalous JSON inputs or repeated requests with negative n_discard values. Fourth, consider deploying llama.cpp inference servers in isolated, sandboxed environments or containers with limited privileges to contain potential exploitation. Fifth, if feasible, temporarily disable or limit the use of the vulnerable completion endpoints until a patch is released. Finally, maintain close communication with the ggml-org project for updates and apply patches promptly once available.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Switzerland
CVE-2026-21869: CWE-787: Out-of-bounds Write in ggml-org llama.cpp
Description
llama.cpp is an inference of several LLM models in C/C++. In commits 55d4206c8 and prior, the n_discard parameter is parsed directly from JSON input in the llama.cpp server's completion endpoints without validation to ensure it's non-negative. When a negative value is supplied and the context fills up, llama_memory_seq_rm/add receives a reversed range and negative offset, causing out-of-bounds memory writes in the token evaluation loop. This deterministic memory corruption can crash the process or enable remote code execution (RCE). There is no fix at the time of publication.
AI-Powered Analysis
Technical Analysis
CVE-2026-21869 is a critical memory corruption vulnerability classified under CWE-787 (Out-of-bounds Write) in the llama.cpp project, which is a C/C++ implementation for inference of large language models (LLMs). The vulnerability exists because the n_discard parameter, which controls token discarding during inference, is parsed directly from JSON input without validation to ensure it is non-negative. When a negative n_discard value is supplied and the context buffer fills up, the llama_memory_seq_rm/add functions receive reversed ranges and negative offsets, causing out-of-bounds writes in the token evaluation loop. This deterministic memory corruption can lead to process crashes or potentially allow remote code execution (RCE) by an attacker who can send crafted JSON requests to the server's completion endpoints. The attack vector is remote and requires no prior authentication but does require user interaction in the form of sending malicious input. The vulnerability affects all versions of llama.cpp up to commit 55d4206c8, and no patch or fix is available as of the publication date (January 7, 2026). Given the nature of the vulnerability, exploitation could compromise confidentiality, integrity, and availability of the affected systems, allowing attackers to execute arbitrary code remotely. The lack of input validation is a critical oversight in the server's JSON parsing logic, and the vulnerability is particularly dangerous in environments where llama.cpp is exposed to untrusted users or integrated into larger AI service infrastructures.
Potential Impact
For European organizations, the impact of CVE-2026-21869 can be severe, especially for those deploying llama.cpp-based LLM inference servers in production, research, or cloud environments. Exploitation could lead to remote code execution, allowing attackers to gain unauthorized control over inference servers, potentially leading to data breaches, manipulation of AI model outputs, or disruption of AI services. Confidentiality is at risk if sensitive data processed by the LLM is exposed or exfiltrated. Integrity can be compromised if attackers alter inference results or inject malicious payloads. Availability is threatened due to possible crashes caused by memory corruption. Organizations relying on AI-driven decision-making or customer-facing AI services could suffer reputational damage and operational downtime. The absence of a patch increases the urgency for immediate mitigations. Additionally, regulatory compliance under GDPR may be impacted if personal data is compromised through exploitation of this vulnerability.
Mitigation Recommendations
Since no official patch is available, European organizations should implement immediate mitigations to reduce risk. First, restrict network access to llama.cpp inference endpoints to trusted internal users only, using network segmentation and firewall rules. Second, implement input validation proxies or web application firewalls (WAFs) that enforce non-negative constraints on the n_discard parameter before requests reach the server. Third, monitor logs and network traffic for anomalous JSON inputs or repeated requests with negative n_discard values. Fourth, consider deploying llama.cpp inference servers in isolated, sandboxed environments or containers with limited privileges to contain potential exploitation. Fifth, if feasible, temporarily disable or limit the use of the vulnerable completion endpoints until a patch is released. Finally, maintain close communication with the ggml-org project for updates and apply patches promptly once available.
Affected Countries
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2026-01-05T16:44:16.368Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 695eeee107b8a419a7712f35
Added to database: 1/7/2026, 11:40:17 PM
Last enriched: 1/7/2026, 11:54:38 PM
Last updated: 1/9/2026, 1:03:59 AM
Views: 28
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2026-22714: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in The Wikimedia Foundation Mediawiki - Monaco Skin
LowCVE-2026-22710: CWE-79 Improper Neutralization of Input During Web Page Generation (XSS or 'Cross-site Scripting') in The Wikimedia Foundation Mediawiki - Wikibase Extension
LowCVE-2026-0733: SQL Injection in PHPGurukul Online Course Registration System
MediumCVE-2026-0732: Command Injection in D-Link DI-8200G
MediumCVE-2026-0731: NULL Pointer Dereference in TOTOLINK WA1200
MediumActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
External Links
Need more coverage?
Upgrade to Pro Console in Console -> Billing for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.