Skip to main content

CVE-2025-6211: CWE-440 Expected Behavior Violation in run-llama run-llama/llama_index

Medium
VulnerabilityCVE-2025-6211cvecve-2025-6211cwe-440
Published: Thu Jul 10 2025 (07/10/2025, 13:04:34 UTC)
Source: CVE Database V5
Vendor/Project: run-llama
Product: run-llama/llama_index

Description

A vulnerability in the DocugamiReader class of the run-llama/llama_index repository, up to version 0.12.28, involves the use of MD5 hashing to generate IDs for document chunks. This approach leads to hash collisions when structurally distinct chunks contain identical text, resulting in one chunk overwriting another. This can cause loss of semantically or legally important document content, breakage of parent-child chunk hierarchies, and inaccurate or hallucinated responses in AI outputs. The issue is resolved in version 0.3.1.

AI-Powered Analysis

AILast updated: 07/10/2025, 13:31:20 UTC

Technical Analysis

CVE-2025-6211 is a medium-severity vulnerability affecting the run-llama/llama_index repository, specifically the DocugamiReader class up to version 0.12.28. The vulnerability arises from the use of MD5 hashing to generate unique identifiers (IDs) for document chunks. MD5, being a cryptographic hash function with known weaknesses, is used here to hash the text content of document chunks to create IDs. However, structurally distinct chunks that contain identical text produce identical MD5 hashes, leading to hash collisions. This collision causes one chunk to overwrite another in the system, resulting in the loss of semantically or legally important document content. Furthermore, this overwriting breaks the parent-child chunk hierarchies that are essential for maintaining the structural integrity of documents. The consequence is that AI models relying on these document chunks for generating responses may produce inaccurate or hallucinated outputs, undermining the reliability of AI-driven document processing or analysis. The vulnerability does not impact confidentiality but affects integrity and availability of document data. The issue has been addressed in version 0.3.1 of the software, which presumably replaces the MD5 hashing mechanism with a more collision-resistant method or a different approach to ID generation. The CVSS score is 6.5 (medium), reflecting the network exploitable nature without privileges or user interaction, and the impact on integrity and availability but not confidentiality. No known exploits are reported in the wild as of the publication date.

Potential Impact

For European organizations, especially those relying on AI-driven document processing tools like run-llama/llama_index, this vulnerability can lead to significant operational and legal risks. Loss or corruption of document chunks can result in incomplete or misleading data being fed into AI models, causing inaccurate outputs that may affect decision-making, compliance reporting, or legal document handling. Sectors such as legal, financial services, healthcare, and government agencies that process sensitive or regulated documents are particularly at risk. The integrity breach could undermine trust in automated document analysis, potentially leading to regulatory non-compliance or erroneous business outcomes. Additionally, the disruption of document hierarchies may complicate audits or forensic investigations. While the vulnerability does not directly expose confidential data, the loss of data integrity and availability can have cascading effects on business processes and AI reliability.

Mitigation Recommendations

European organizations using run-llama/llama_index should immediately upgrade to version 0.3.1 or later, where the vulnerability is fixed. If upgrading is not immediately feasible, organizations should consider implementing additional validation layers to detect and handle hash collisions, such as appending structural metadata to the hash input or switching to a collision-resistant hashing algorithm like SHA-256 for ID generation. It is also advisable to audit existing document chunk data for signs of overwriting or data loss and to maintain robust backups of original documents. Organizations should monitor AI output quality closely for hallucinations or inaccuracies that may indicate underlying data integrity issues. Finally, integrating integrity checks and version control mechanisms for document chunks can help detect and prevent silent data corruption.

Need more detailed analysis?Get Pro

Technical Details

Data Version
5.1
Assigner Short Name
@huntr_ai
Date Reserved
2025-06-17T17:36:01.333Z
Cvss Version
3.0
State
PUBLISHED

Threat ID: 686fbd17a83201eaaca7d1d9

Added to database: 7/10/2025, 1:16:07 PM

Last enriched: 7/10/2025, 1:31:20 PM

Last updated: 7/13/2025, 6:39:40 PM

Views: 17

Actions

PRO

Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.

Please log in to the Console to use AI analysis features.

Need enhanced features?

Contact root@offseq.com for Pro access with improved analysis and higher rate limits.

Latest Threats