CVE-2025-1194: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file `tokenization_gpt_neox_japanese.py` of the GPT-NeoX-Japanese model. The vulnerability occurs in the SubWordJapaneseTokenizer class, where regular expressions process specially crafted inputs. The issue stems from a regex exhibiting exponential complexity under certain conditions, leading to excessive backtracking. This can result in high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.48.1 (latest).
AI Analysis
Technical Summary
CVE-2025-1194 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the huggingface/transformers library, specifically within the GPT-NeoX-Japanese model's tokenization component. The vulnerability resides in the SubWordJapaneseTokenizer class, implemented in the file tokenization_gpt_neox_japanese.py. This class uses regular expressions to process Japanese text inputs for tokenization purposes. The flaw is due to a regular expression exhibiting exponential time complexity under certain crafted inputs, causing excessive backtracking during regex evaluation. When exploited, this leads to high CPU consumption and can degrade application performance or cause downtime, effectively resulting in a Denial of Service (DoS). The affected version is noted as v4.48.1, which is the latest release at the time of disclosure. The CVSS v3.0 base score is 4.3, indicating a medium severity level. The attack vector is network-based (AV:N), requires no privileges (PR:N), but does require user interaction (UI:R), and impacts availability only (A:L) without affecting confidentiality or integrity. No known exploits have been reported in the wild, and no official patches have been linked yet. The vulnerability is categorized under CWE-1333 (Inefficient Regular Expression Complexity), a known class of ReDoS issues where poorly constructed regex patterns can be exploited to cause resource exhaustion. This vulnerability is particularly relevant for applications or services that use the huggingface transformers library for Japanese language processing, especially those exposing tokenization functionality to untrusted inputs, such as web APIs or user-facing NLP services.
Potential Impact
For European organizations, the impact of this vulnerability primarily concerns availability disruptions in services that utilize the huggingface/transformers library for Japanese language processing. Organizations involved in AI, natural language processing, or machine learning that deploy GPT-NeoX-Japanese models could experience service slowdowns or outages if malicious actors supply crafted inputs triggering the ReDoS condition. This can affect customer-facing applications, automated translation services, chatbots, or any system relying on Japanese tokenization. While the vulnerability does not compromise confidentiality or integrity, the denial of service can lead to operational downtime, degraded user experience, and potential financial losses. Additionally, organizations with compliance requirements for service availability (e.g., financial institutions or critical infrastructure providers) may face regulatory scrutiny if disruptions occur. Given the medium CVSS score and the requirement for user interaction, the threat is moderate but should not be underestimated in environments with high traffic or exposed NLP endpoints.
Mitigation Recommendations
Implement input validation and sanitization to detect and block unusually long or suspiciously crafted Japanese text inputs before they reach the tokenization process. Deploy rate limiting and request throttling on APIs or services that expose the tokenization functionality to reduce the risk of repeated exploitation attempts. Monitor CPU usage and application performance metrics closely to detect spikes that may indicate attempted ReDoS attacks. Isolate the tokenization service or run it in a sandboxed environment with resource constraints (e.g., CPU time limits) to prevent system-wide impact. Engage with the huggingface community or maintainers to track the release of patches or updates addressing this vulnerability and apply them promptly once available. Consider fallback mechanisms or alternative tokenization libraries for Japanese text processing if immediate patching is not feasible. Educate developers and security teams about the risks of ReDoS vulnerabilities and encourage secure regex design and testing practices in NLP components.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland
CVE-2025-1194: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers
Description
A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library, specifically in the file `tokenization_gpt_neox_japanese.py` of the GPT-NeoX-Japanese model. The vulnerability occurs in the SubWordJapaneseTokenizer class, where regular expressions process specially crafted inputs. The issue stems from a regex exhibiting exponential complexity under certain conditions, leading to excessive backtracking. This can result in high CPU usage and potential application downtime, effectively creating a Denial of Service (DoS) scenario. The affected version is v4.48.1 (latest).
AI-Powered Analysis
Technical Analysis
CVE-2025-1194 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the huggingface/transformers library, specifically within the GPT-NeoX-Japanese model's tokenization component. The vulnerability resides in the SubWordJapaneseTokenizer class, implemented in the file tokenization_gpt_neox_japanese.py. This class uses regular expressions to process Japanese text inputs for tokenization purposes. The flaw is due to a regular expression exhibiting exponential time complexity under certain crafted inputs, causing excessive backtracking during regex evaluation. When exploited, this leads to high CPU consumption and can degrade application performance or cause downtime, effectively resulting in a Denial of Service (DoS). The affected version is noted as v4.48.1, which is the latest release at the time of disclosure. The CVSS v3.0 base score is 4.3, indicating a medium severity level. The attack vector is network-based (AV:N), requires no privileges (PR:N), but does require user interaction (UI:R), and impacts availability only (A:L) without affecting confidentiality or integrity. No known exploits have been reported in the wild, and no official patches have been linked yet. The vulnerability is categorized under CWE-1333 (Inefficient Regular Expression Complexity), a known class of ReDoS issues where poorly constructed regex patterns can be exploited to cause resource exhaustion. This vulnerability is particularly relevant for applications or services that use the huggingface transformers library for Japanese language processing, especially those exposing tokenization functionality to untrusted inputs, such as web APIs or user-facing NLP services.
Potential Impact
For European organizations, the impact of this vulnerability primarily concerns availability disruptions in services that utilize the huggingface/transformers library for Japanese language processing. Organizations involved in AI, natural language processing, or machine learning that deploy GPT-NeoX-Japanese models could experience service slowdowns or outages if malicious actors supply crafted inputs triggering the ReDoS condition. This can affect customer-facing applications, automated translation services, chatbots, or any system relying on Japanese tokenization. While the vulnerability does not compromise confidentiality or integrity, the denial of service can lead to operational downtime, degraded user experience, and potential financial losses. Additionally, organizations with compliance requirements for service availability (e.g., financial institutions or critical infrastructure providers) may face regulatory scrutiny if disruptions occur. Given the medium CVSS score and the requirement for user interaction, the threat is moderate but should not be underestimated in environments with high traffic or exposed NLP endpoints.
Mitigation Recommendations
Implement input validation and sanitization to detect and block unusually long or suspiciously crafted Japanese text inputs before they reach the tokenization process. Deploy rate limiting and request throttling on APIs or services that expose the tokenization functionality to reduce the risk of repeated exploitation attempts. Monitor CPU usage and application performance metrics closely to detect spikes that may indicate attempted ReDoS attacks. Isolate the tokenization service or run it in a sandboxed environment with resource constraints (e.g., CPU time limits) to prevent system-wide impact. Engage with the huggingface community or maintainers to track the release of patches or updates addressing this vulnerability and apply them promptly once available. Consider fallback mechanisms or alternative tokenization libraries for Japanese text processing if immediate patching is not feasible. Educate developers and security teams about the risks of ReDoS vulnerabilities and encourage secure regex design and testing practices in NLP components.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2025-02-10T14:13:43.276Z
- Cisa Enriched
- true
- Cvss Version
- 3.0
- State
- PUBLISHED
Threat ID: 682d983dc4522896dcbef0d4
Added to database: 5/21/2025, 9:09:17 AM
Last enriched: 6/24/2025, 11:05:00 PM
Last updated: 8/11/2025, 4:02:24 AM
Views: 43
Related Threats
CVE-2025-41242: Vulnerability in VMware Spring Framework
MediumCVE-2025-47206: CWE-787 in QNAP Systems Inc. File Station 5
HighCVE-2025-5296: CWE-59 Improper Link Resolution Before File Access ('Link Following') in Schneider Electric SESU
HighCVE-2025-6625: CWE-20 Improper Input Validation in Schneider Electric Modicon M340
HighCVE-2025-57703: CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in Delta Electronics DIAEnergie
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.