CVE-2025-6638: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically affecting the MarianTokenizer's `remove_language_code()` method. This vulnerability is present in version 4.52.4 and has been fixed in version 4.53.0. The issue arises from inefficient regex processing, which can be exploited by crafted input strings containing malformed language code patterns, leading to excessive CPU consumption and potential denial of service.
AI Analysis
Technical Summary
CVE-2025-6638 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the Hugging Face Transformers library, specifically within the MarianTokenizer's remove_language_code() method. This vulnerability stems from inefficient processing of regular expressions when handling input strings containing malformed or crafted language code patterns. The affected versions include 4.52.4 and potentially earlier versions, with the issue resolved in version 4.53.0. The vulnerability allows an attacker to supply specially crafted input that triggers excessive CPU consumption due to the regex engine's backtracking behavior, leading to a denial of service condition. This type of vulnerability does not compromise confidentiality or integrity but impacts availability by exhausting processing resources. The CVSS score of 5.3 (medium severity) reflects that the attack can be launched remotely without authentication or user interaction, but the impact is limited to availability degradation. No known exploits have been reported in the wild as of the publication date. The root cause is inefficient regex complexity (CWE-1333), a common issue where certain regex patterns cause exponential time processing on malicious inputs. Since Hugging Face Transformers is widely used in natural language processing (NLP) applications, especially those involving tokenization and language detection, this vulnerability could affect any service or application that utilizes the vulnerable tokenizer method on untrusted input data. Attackers could exploit this by sending maliciously crafted text inputs to NLP services, causing service slowdowns or outages.
Potential Impact
For European organizations, the impact primarily concerns availability disruptions in NLP-driven services or applications that integrate the Hugging Face Transformers library, particularly the MarianTokenizer component. Organizations leveraging these models for language translation, chatbots, content moderation, or other AI-driven text processing could experience service degradation or denial of service if exposed to crafted inputs exploiting this vulnerability. This could affect customer-facing applications, internal automation, or data processing pipelines, potentially leading to operational interruptions and reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches or unauthorized data manipulation are not direct concerns. However, denial of service conditions could impact critical services, especially in sectors like finance, healthcare, or government where NLP tools are increasingly integrated. The medium severity rating suggests that while the risk is not critical, it should be addressed promptly to maintain service reliability and prevent potential exploitation in production environments.
Mitigation Recommendations
European organizations should upgrade the Hugging Face Transformers library to version 4.53.0 or later, where the vulnerability has been fixed. If immediate upgrading is not feasible, organizations should implement input validation and sanitization to detect and reject malformed or suspicious language code patterns before they reach the vulnerable tokenizer method. Rate limiting and anomaly detection on text input endpoints can help mitigate the risk of exploitation by limiting the volume of potentially malicious inputs. Additionally, monitoring CPU usage and application performance metrics can provide early warning signs of attempted ReDoS attacks. For critical systems, consider isolating NLP processing components behind dedicated services with resource constraints (e.g., CPU quotas, timeouts) to prevent cascading failures. Security teams should also review deployment architectures to ensure that untrusted user inputs are not directly processed by vulnerable components without proper filtering. Finally, maintain awareness of updates from Hugging Face and related security advisories to promptly apply patches.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Italy, Spain
CVE-2025-6638: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers
Description
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically affecting the MarianTokenizer's `remove_language_code()` method. This vulnerability is present in version 4.52.4 and has been fixed in version 4.53.0. The issue arises from inefficient regex processing, which can be exploited by crafted input strings containing malformed language code patterns, leading to excessive CPU consumption and potential denial of service.
AI-Powered Analysis
Technical Analysis
CVE-2025-6638 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the Hugging Face Transformers library, specifically within the MarianTokenizer's remove_language_code() method. This vulnerability stems from inefficient processing of regular expressions when handling input strings containing malformed or crafted language code patterns. The affected versions include 4.52.4 and potentially earlier versions, with the issue resolved in version 4.53.0. The vulnerability allows an attacker to supply specially crafted input that triggers excessive CPU consumption due to the regex engine's backtracking behavior, leading to a denial of service condition. This type of vulnerability does not compromise confidentiality or integrity but impacts availability by exhausting processing resources. The CVSS score of 5.3 (medium severity) reflects that the attack can be launched remotely without authentication or user interaction, but the impact is limited to availability degradation. No known exploits have been reported in the wild as of the publication date. The root cause is inefficient regex complexity (CWE-1333), a common issue where certain regex patterns cause exponential time processing on malicious inputs. Since Hugging Face Transformers is widely used in natural language processing (NLP) applications, especially those involving tokenization and language detection, this vulnerability could affect any service or application that utilizes the vulnerable tokenizer method on untrusted input data. Attackers could exploit this by sending maliciously crafted text inputs to NLP services, causing service slowdowns or outages.
Potential Impact
For European organizations, the impact primarily concerns availability disruptions in NLP-driven services or applications that integrate the Hugging Face Transformers library, particularly the MarianTokenizer component. Organizations leveraging these models for language translation, chatbots, content moderation, or other AI-driven text processing could experience service degradation or denial of service if exposed to crafted inputs exploiting this vulnerability. This could affect customer-facing applications, internal automation, or data processing pipelines, potentially leading to operational interruptions and reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches or unauthorized data manipulation are not direct concerns. However, denial of service conditions could impact critical services, especially in sectors like finance, healthcare, or government where NLP tools are increasingly integrated. The medium severity rating suggests that while the risk is not critical, it should be addressed promptly to maintain service reliability and prevent potential exploitation in production environments.
Mitigation Recommendations
European organizations should upgrade the Hugging Face Transformers library to version 4.53.0 or later, where the vulnerability has been fixed. If immediate upgrading is not feasible, organizations should implement input validation and sanitization to detect and reject malformed or suspicious language code patterns before they reach the vulnerable tokenizer method. Rate limiting and anomaly detection on text input endpoints can help mitigate the risk of exploitation by limiting the volume of potentially malicious inputs. Additionally, monitoring CPU usage and application performance metrics can provide early warning signs of attempted ReDoS attacks. For critical systems, consider isolating NLP processing components behind dedicated services with resource constraints (e.g., CPU quotas, timeouts) to prevent cascading failures. Security teams should also review deployment architectures to ensure that untrusted user inputs are not directly processed by vulnerable components without proper filtering. Finally, maintain awareness of updates from Hugging Face and related security advisories to promptly apply patches.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2025-06-25T14:07:29.841Z
- Cvss Version
- 3.0
- State
- PUBLISHED
Threat ID: 68c3fa7ca466eacfca92b720
Added to database: 9/12/2025, 10:48:28 AM
Last enriched: 9/12/2025, 10:48:46 AM
Last updated: 10/30/2025, 12:13:27 PM
Views: 56
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-10317: CWE-352 Cross-Site Request Forgery (CSRF) in OpenSolution Quick.Cart
MediumCanada Says Hackers Tampered With ICS at Water Facility, Oil and Gas Firm
MediumCVE-2025-39663: CWE-80: Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS) in Checkmk GmbH Checkmk
HighCVE-2025-53883: CWE-80: Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS) in SUSE Container suse manager 5.0
Critical136 NPM Packages Delivering Infostealers Downloaded 100,000 Times
MediumActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.