Skip to main content

CVE-2025-6638: CWE-1333 Inefficient Regular Expression Complexity in huggingface huggingface/transformers

Medium
VulnerabilityCVE-2025-6638cvecve-2025-6638cwe-1333
Published: Fri Sep 12 2025 (09/12/2025, 10:46:07 UTC)
Source: CVE Database V5
Vendor/Project: huggingface
Product: huggingface/transformers

Description

A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically affecting the MarianTokenizer's `remove_language_code()` method. This vulnerability is present in version 4.52.4 and has been fixed in version 4.53.0. The issue arises from inefficient regex processing, which can be exploited by crafted input strings containing malformed language code patterns, leading to excessive CPU consumption and potential denial of service.

AI-Powered Analysis

AILast updated: 09/12/2025, 10:48:46 UTC

Technical Analysis

CVE-2025-6638 is a Regular Expression Denial of Service (ReDoS) vulnerability identified in the Hugging Face Transformers library, specifically within the MarianTokenizer's remove_language_code() method. This vulnerability stems from inefficient processing of regular expressions when handling input strings containing malformed or crafted language code patterns. The affected versions include 4.52.4 and potentially earlier versions, with the issue resolved in version 4.53.0. The vulnerability allows an attacker to supply specially crafted input that triggers excessive CPU consumption due to the regex engine's backtracking behavior, leading to a denial of service condition. This type of vulnerability does not compromise confidentiality or integrity but impacts availability by exhausting processing resources. The CVSS score of 5.3 (medium severity) reflects that the attack can be launched remotely without authentication or user interaction, but the impact is limited to availability degradation. No known exploits have been reported in the wild as of the publication date. The root cause is inefficient regex complexity (CWE-1333), a common issue where certain regex patterns cause exponential time processing on malicious inputs. Since Hugging Face Transformers is widely used in natural language processing (NLP) applications, especially those involving tokenization and language detection, this vulnerability could affect any service or application that utilizes the vulnerable tokenizer method on untrusted input data. Attackers could exploit this by sending maliciously crafted text inputs to NLP services, causing service slowdowns or outages.

Potential Impact

For European organizations, the impact primarily concerns availability disruptions in NLP-driven services or applications that integrate the Hugging Face Transformers library, particularly the MarianTokenizer component. Organizations leveraging these models for language translation, chatbots, content moderation, or other AI-driven text processing could experience service degradation or denial of service if exposed to crafted inputs exploiting this vulnerability. This could affect customer-facing applications, internal automation, or data processing pipelines, potentially leading to operational interruptions and reputational damage. Since the vulnerability does not affect confidentiality or integrity, data breaches or unauthorized data manipulation are not direct concerns. However, denial of service conditions could impact critical services, especially in sectors like finance, healthcare, or government where NLP tools are increasingly integrated. The medium severity rating suggests that while the risk is not critical, it should be addressed promptly to maintain service reliability and prevent potential exploitation in production environments.

Mitigation Recommendations

European organizations should upgrade the Hugging Face Transformers library to version 4.53.0 or later, where the vulnerability has been fixed. If immediate upgrading is not feasible, organizations should implement input validation and sanitization to detect and reject malformed or suspicious language code patterns before they reach the vulnerable tokenizer method. Rate limiting and anomaly detection on text input endpoints can help mitigate the risk of exploitation by limiting the volume of potentially malicious inputs. Additionally, monitoring CPU usage and application performance metrics can provide early warning signs of attempted ReDoS attacks. For critical systems, consider isolating NLP processing components behind dedicated services with resource constraints (e.g., CPU quotas, timeouts) to prevent cascading failures. Security teams should also review deployment architectures to ensure that untrusted user inputs are not directly processed by vulnerable components without proper filtering. Finally, maintain awareness of updates from Hugging Face and related security advisories to promptly apply patches.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
@huntr_ai
Date Reserved
2025-06-25T14:07:29.841Z
Cvss Version
3.0
State
PUBLISHED

Threat ID: 68c3fa7ca466eacfca92b720

Added to database: 9/12/2025, 10:48:28 AM

Last enriched: 9/12/2025, 10:48:46 AM

Last updated: 9/12/2025, 2:14:06 PM

Views: 5

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats