CVE-2026-24158: CWE-789 Memory Allocation with Excessive Size Value in NVIDIA Triton Inference Server
CVE-2026-24158 is a high-severity vulnerability in NVIDIA Triton Inference Server affecting all versions prior to 26. 01. It involves improper handling of memory allocation sizes when processing large compressed payloads via the HTTP endpoint, leading to potential denial of service (DoS). The flaw stems from CWE-789, where excessive memory allocation requests can exhaust system resources. Exploitation requires no authentication or user interaction and can be triggered remotely over the network. Although no known exploits are currently reported in the wild, the vulnerability poses a significant risk to availability of AI inference services. Organizations using Triton Inference Server should prioritize patching to version 26. 01 or later once available. Mitigations include limiting payload sizes, implementing rate limiting, and monitoring for abnormal HTTP request patterns. Countries with high adoption of AI infrastructure and NVIDIA products, including the United States, China, Germany, Japan, South Korea, and the United Kingdom, are most likely to be affected.
AI Analysis
Technical Summary
CVE-2026-24158 is a vulnerability identified in NVIDIA's Triton Inference Server, a widely used platform for deploying AI models in production environments. The issue arises from the server's HTTP endpoint, which processes incoming compressed payloads without adequately validating the size of memory allocation requests. Specifically, the vulnerability is categorized under CWE-789, indicating that the server may attempt to allocate an excessively large amount of memory based on attacker-supplied input. This can lead to resource exhaustion, causing the server to crash or become unresponsive, effectively resulting in a denial of service (DoS). The vulnerability affects all versions of Triton Inference Server prior to 26.01. The CVSS v3.1 base score is 7.5, reflecting high severity due to the ease of remote exploitation (no privileges or user interaction required) and the impact on availability. While no public exploits have been reported yet, the potential for disruption in AI inference workloads is significant, especially in environments relying on Triton for critical machine learning services. The vulnerability does not impact confidentiality or integrity but solely targets availability. The lack of authentication requirements and the network attack vector increase the risk profile. The technical root cause is insufficient validation of the size parameter during decompression or memory allocation, allowing attackers to craft payloads that trigger excessive memory requests. This flaw can be exploited by sending specially crafted HTTP requests with large compressed payloads to the vulnerable server.
Potential Impact
The primary impact of CVE-2026-24158 is denial of service, which can disrupt AI inference services relying on NVIDIA Triton Inference Server. Organizations deploying AI models in production environments may experience service outages, degraded performance, or crashes, affecting business-critical applications such as autonomous systems, real-time analytics, and cloud-based AI services. This disruption can lead to operational downtime, loss of customer trust, and potential financial losses. Since the vulnerability is remotely exploitable without authentication, attackers can launch DoS attacks at scale, potentially targeting multiple servers or cloud instances. The impact is particularly severe for organizations with high dependency on AI inference for decision-making or customer-facing services. Additionally, the resource exhaustion could be leveraged as part of a larger multi-vector attack to weaken defenses. Although confidentiality and integrity are not directly affected, the availability impact alone can have cascading effects on organizational operations and service level agreements (SLAs).
Mitigation Recommendations
To mitigate CVE-2026-24158, organizations should upgrade NVIDIA Triton Inference Server to version 26.01 or later as soon as the patch becomes available. In the interim, implement strict input validation and limit the maximum allowed size for compressed payloads at the HTTP endpoint to prevent excessive memory allocation attempts. Deploy network-level protections such as rate limiting, web application firewalls (WAFs), and anomaly detection systems to identify and block suspicious large payload requests. Monitor server resource usage and HTTP request patterns to detect early signs of exploitation attempts. Consider isolating Triton servers in segmented network zones with restricted access to reduce exposure. Employ runtime memory monitoring tools to detect abnormal allocation spikes. Additionally, review and harden server configurations to minimize attack surface, including disabling unnecessary HTTP endpoints if possible. Maintain up-to-date incident response plans to quickly address potential DoS incidents. Regularly audit and test the environment for similar memory allocation vulnerabilities.
Affected Countries
United States, China, Germany, Japan, South Korea, United Kingdom, Canada, France, India, Australia
CVE-2026-24158: CWE-789 Memory Allocation with Excessive Size Value in NVIDIA Triton Inference Server
Description
CVE-2026-24158 is a high-severity vulnerability in NVIDIA Triton Inference Server affecting all versions prior to 26. 01. It involves improper handling of memory allocation sizes when processing large compressed payloads via the HTTP endpoint, leading to potential denial of service (DoS). The flaw stems from CWE-789, where excessive memory allocation requests can exhaust system resources. Exploitation requires no authentication or user interaction and can be triggered remotely over the network. Although no known exploits are currently reported in the wild, the vulnerability poses a significant risk to availability of AI inference services. Organizations using Triton Inference Server should prioritize patching to version 26. 01 or later once available. Mitigations include limiting payload sizes, implementing rate limiting, and monitoring for abnormal HTTP request patterns. Countries with high adoption of AI infrastructure and NVIDIA products, including the United States, China, Germany, Japan, South Korea, and the United Kingdom, are most likely to be affected.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
CVE-2026-24158 is a vulnerability identified in NVIDIA's Triton Inference Server, a widely used platform for deploying AI models in production environments. The issue arises from the server's HTTP endpoint, which processes incoming compressed payloads without adequately validating the size of memory allocation requests. Specifically, the vulnerability is categorized under CWE-789, indicating that the server may attempt to allocate an excessively large amount of memory based on attacker-supplied input. This can lead to resource exhaustion, causing the server to crash or become unresponsive, effectively resulting in a denial of service (DoS). The vulnerability affects all versions of Triton Inference Server prior to 26.01. The CVSS v3.1 base score is 7.5, reflecting high severity due to the ease of remote exploitation (no privileges or user interaction required) and the impact on availability. While no public exploits have been reported yet, the potential for disruption in AI inference workloads is significant, especially in environments relying on Triton for critical machine learning services. The vulnerability does not impact confidentiality or integrity but solely targets availability. The lack of authentication requirements and the network attack vector increase the risk profile. The technical root cause is insufficient validation of the size parameter during decompression or memory allocation, allowing attackers to craft payloads that trigger excessive memory requests. This flaw can be exploited by sending specially crafted HTTP requests with large compressed payloads to the vulnerable server.
Potential Impact
The primary impact of CVE-2026-24158 is denial of service, which can disrupt AI inference services relying on NVIDIA Triton Inference Server. Organizations deploying AI models in production environments may experience service outages, degraded performance, or crashes, affecting business-critical applications such as autonomous systems, real-time analytics, and cloud-based AI services. This disruption can lead to operational downtime, loss of customer trust, and potential financial losses. Since the vulnerability is remotely exploitable without authentication, attackers can launch DoS attacks at scale, potentially targeting multiple servers or cloud instances. The impact is particularly severe for organizations with high dependency on AI inference for decision-making or customer-facing services. Additionally, the resource exhaustion could be leveraged as part of a larger multi-vector attack to weaken defenses. Although confidentiality and integrity are not directly affected, the availability impact alone can have cascading effects on organizational operations and service level agreements (SLAs).
Mitigation Recommendations
To mitigate CVE-2026-24158, organizations should upgrade NVIDIA Triton Inference Server to version 26.01 or later as soon as the patch becomes available. In the interim, implement strict input validation and limit the maximum allowed size for compressed payloads at the HTTP endpoint to prevent excessive memory allocation attempts. Deploy network-level protections such as rate limiting, web application firewalls (WAFs), and anomaly detection systems to identify and block suspicious large payload requests. Monitor server resource usage and HTTP request patterns to detect early signs of exploitation attempts. Consider isolating Triton servers in segmented network zones with restricted access to reduce exposure. Employ runtime memory monitoring tools to detect abnormal allocation spikes. Additionally, review and harden server configurations to minimize attack surface, including disabling unnecessary HTTP endpoints if possible. Maintain up-to-date incident response plans to quickly address potential DoS incidents. Regularly audit and test the environment for similar memory allocation vulnerabilities.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- nvidia
- Date Reserved
- 2026-01-21T19:09:29.851Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 69c2f483f4197a8e3b75624b
Added to database: 3/24/2026, 8:30:59 PM
Last enriched: 3/24/2026, 8:46:39 PM
Last updated: 3/24/2026, 9:49:14 PM
Views: 3
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.