CVE-2026-44223: CWE-131: Incorrect Calculation of Buffer Size in vllm-project vllm
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
AI Analysis
Technical Summary
CVE-2026-44223 describes a vulnerability in vLLM (versions 0.18.0 up to but not including 0.20.0) where the extract_hidden_states speculative decoding proposer returns a tensor with an incorrect shape after the first decode step if any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). This causes a RuntimeError that crashes the EngineCore process, resulting in denial of service. The vulnerability is due to incorrect calculation of buffer size (CWE-131) and improper handling of speculative decoding (CWE-704). The issue is resolved in vLLM version 0.20.0.
Potential Impact
The vulnerability causes a denial of service by crashing the EngineCore process of vLLM when a request includes sampling penalty parameters. This can disrupt availability of the inference and serving engine for large language models. There is no impact on confidentiality or integrity reported. No known exploits in the wild have been documented.
Mitigation Recommendations
Upgrade vLLM to version 0.20.0 or later, where this vulnerability is fixed. No other mitigation or workaround is indicated. Patch status is confirmed by the vendor stating the issue is fixed in 0.20.0.
CVE-2026-44223: CWE-131: Incorrect Calculation of Buffer Size in vllm-project vllm
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
CVSS v3.1
Score 6.5medium
Affected software
Run on your own infrastructure? Check whether these packages are installed with threat-finder — our free open-source scanner.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
CVE-2026-44223 describes a vulnerability in vLLM (versions 0.18.0 up to but not including 0.20.0) where the extract_hidden_states speculative decoding proposer returns a tensor with an incorrect shape after the first decode step if any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). This causes a RuntimeError that crashes the EngineCore process, resulting in denial of service. The vulnerability is due to incorrect calculation of buffer size (CWE-131) and improper handling of speculative decoding (CWE-704). The issue is resolved in vLLM version 0.20.0.
Potential Impact
The vulnerability causes a denial of service by crashing the EngineCore process of vLLM when a request includes sampling penalty parameters. This can disrupt availability of the inference and serving engine for large language models. There is no impact on confidentiality or integrity reported. No known exploits in the wild have been documented.
Mitigation Recommendations
Upgrade vLLM to version 0.20.0 or later, where this vulnerability is fixed. No other mitigation or workaround is indicated. Patch status is confirmed by the vendor stating the issue is fixed in 0.20.0.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2026-05-05T15:42:40.518Z
- Cvss Version
- 3.1
- State
- PUBLISHED
- Remediation Level
- null
Threat ID: 6a038bd7cbff5d8610164968
Added to database: 05/12/2026, 20:21:43 UTC
Last enriched: 06/23/2026, 14:17:06 UTC
Last updated: 07/03/2026, 04:39:16 UTC
Views: 127
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.