Skip to main content

CVE-2025-1194: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers

Medium
VulnerabilityCVE-2025-1194cvecve-2025-1194cwe-1333
Published: Tue Apr 29 2025 (04/29/2025, 11:30:38 UTC)
Source: CVE
Vendor/Project: huggingface
Product: huggingface/transformers

Description

A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file `tokenization_gpt_neox_japanese.py` of the GPT-NeoX-Japanese model. The vulnerability occurs in the SubWordJapaneseTokenizer class, where regular expressions process specially crafted inputs. The issue stems from a regex exhibiting exponential complexity under certain conditions, leading to excessive backtracking. This can result in high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.48.1 (latest).

AI-Powered Analysis

AILast updated: 06/24/2025, 23:05:00 UTC

Technical Analysis

CVE-2025-1194 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the huggingface/transformers library, specifically within the GPT-NeoX-Japanese model's tokenization component. The vulnerability resides in the SubWordJapaneseTokenizer class, implemented in the file tokenization_gpt_neox_japanese.py. This class uses regular expressions to process Japanese text inputs for tokenization purposes. The flaw is due to a regular expression exhibiting exponential time complexity under certain crafted inputs, causing excessive backtracking during regex evaluation. When exploited, this leads to high CPU consumption and can degrade application performance or cause downtime, effectively resulting in a Denial of Service (DoS). The affected version is noted as v4.48.1, which is the latest release at the time of disclosure. The CVSS v3.0 base score is 4.3, indicating a medium severity level. The attack vector is network-based (AV:N), requires no privileges (PR:N), but does require user interaction (UI:R), and impacts availability only (A:L) without affecting confidentiality or integrity. No known exploits have been reported in the wild, and no official patches have been linked yet. The vulnerability is categorized under CWE-1333 (Inefficient Regular Expression Complexity), a known class of ReDoS issues where poorly constructed regex patterns can be exploited to cause resource exhaustion. This vulnerability is particularly relevant for applications or services that use the huggingface transformers library for Japanese language processing, especially those exposing tokenization functionality to untrusted inputs, such as web APIs or user-facing NLP services.

Potential Impact

For European organizations, the impact of this vulnerability primarily concerns availability disruptions in services that utilize the huggingface/transformers library for Japanese language processing. Organizations involved in AI, natural language processing, or machine learning that deploy GPT-NeoX-Japanese models could experience service slowdowns or outages if malicious actors supply crafted inputs triggering the ReDoS condition. This can affect customer-facing applications, automated translation services, chatbots, or any system relying on Japanese tokenization. While the vulnerability does not compromise confidentiality or integrity, the denial of service can lead to operational downtime, degraded user experience, and potential financial losses. Additionally, organizations with compliance requirements for service availability (e.g., financial institutions or critical infrastructure providers) may face regulatory scrutiny if disruptions occur. Given the medium CVSS score and the requirement for user interaction, the threat is moderate but should not be underestimated in environments with high traffic or exposed NLP endpoints.

Mitigation Recommendations

Implement input validation and sanitization to detect and block unusually long or suspiciously crafted Japanese text inputs before they reach the tokenization process. Deploy rate limiting and request throttling on APIs or services that expose the tokenization functionality to reduce the risk of repeated exploitation attempts. Monitor CPU usage and application performance metrics closely to detect spikes that may indicate attempted ReDoS attacks. Isolate the tokenization service or run it in a sandboxed environment with resource constraints (e.g., CPU time limits) to prevent system-wide impact. Engage with the huggingface community or maintainers to track the release of patches or updates addressing this vulnerability and apply them promptly once available. Consider fallback mechanisms or alternative tokenization libraries for Japanese text processing if immediate patching is not feasible. Educate developers and security teams about the risks of ReDoS vulnerabilities and encourage secure regex design and testing practices in NLP components.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
@huntr_ai
Date Reserved
2025-02-10T14:13:43.276Z
Cisa Enriched
true
Cvss Version
3.0
State
PUBLISHED

Threat ID: 682d983dc4522896dcbef0d4

Added to database: 5/21/2025, 9:09:17 AM

Last enriched: 6/24/2025, 11:05:00 PM

Last updated: 8/11/2025, 4:02:24 AM

Views: 43

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats