CVE-2025-6211: CWE-440 Expected Behavior Violation in run-llama run-llama/llama_index
A vulnerability in the DocugamiReader class of the run-llama/llama_index repository, up to version 0.12.28, involves the use of MD5 hashing to generate IDs for document chunks. This approach leads to hash collisions when structurally distinct chunks contain identical text, resulting in one chunk overwriting another. This can cause loss of semantically or legally important document content, breakage of parent-child chunk hierarchies, and inaccurate or hallucinated responses in AI outputs. The issue is resolved in version 0.3.1.
AI Analysis
Technical Summary
CVE-2025-6211 is a medium-severity vulnerability affecting the run-llama/llama_index repository, specifically the DocugamiReader class up to version 0.12.28. The vulnerability arises from the use of MD5 hashing to generate unique identifiers (IDs) for document chunks. MD5, being a cryptographic hash function with known weaknesses, is used here to hash the text content of document chunks to create IDs. However, structurally distinct chunks that contain identical text produce identical MD5 hashes, leading to hash collisions. This collision causes one chunk to overwrite another in the system, resulting in the loss of semantically or legally important document content. Furthermore, this overwriting breaks the parent-child chunk hierarchies that are essential for maintaining the structural integrity of documents. The consequence is that AI models relying on these document chunks for generating responses may produce inaccurate or hallucinated outputs, undermining the reliability of AI-driven document processing or analysis. The vulnerability does not impact confidentiality but affects integrity and availability of document data. The issue has been addressed in version 0.3.1 of the software, which presumably replaces the MD5 hashing mechanism with a more collision-resistant method or a different approach to ID generation. The CVSS score is 6.5 (medium), reflecting the network exploitable nature without privileges or user interaction, and the impact on integrity and availability but not confidentiality. No known exploits are reported in the wild as of the publication date.
Potential Impact
For European organizations, especially those relying on AI-driven document processing tools like run-llama/llama_index, this vulnerability can lead to significant operational and legal risks. Loss or corruption of document chunks can result in incomplete or misleading data being fed into AI models, causing inaccurate outputs that may affect decision-making, compliance reporting, or legal document handling. Sectors such as legal, financial services, healthcare, and government agencies that process sensitive or regulated documents are particularly at risk. The integrity breach could undermine trust in automated document analysis, potentially leading to regulatory non-compliance or erroneous business outcomes. Additionally, the disruption of document hierarchies may complicate audits or forensic investigations. While the vulnerability does not directly expose confidential data, the loss of data integrity and availability can have cascading effects on business processes and AI reliability.
Mitigation Recommendations
European organizations using run-llama/llama_index should immediately upgrade to version 0.3.1 or later, where the vulnerability is fixed. If upgrading is not immediately feasible, organizations should consider implementing additional validation layers to detect and handle hash collisions, such as appending structural metadata to the hash input or switching to a collision-resistant hashing algorithm like SHA-256 for ID generation. It is also advisable to audit existing document chunk data for signs of overwriting or data loss and to maintain robust backups of original documents. Organizations should monitor AI output quality closely for hallucinations or inaccuracies that may indicate underlying data integrity issues. Finally, integrating integrity checks and version control mechanisms for document chunks can help detect and prevent silent data corruption.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Italy, Spain
CVE-2025-6211: CWE-440 Expected Behavior Violation in run-llama run-llama/llama_index
Description
A vulnerability in the DocugamiReader class of the run-llama/llama_index repository, up to version 0.12.28, involves the use of MD5 hashing to generate IDs for document chunks. This approach leads to hash collisions when structurally distinct chunks contain identical text, resulting in one chunk overwriting another. This can cause loss of semantically or legally important document content, breakage of parent-child chunk hierarchies, and inaccurate or hallucinated responses in AI outputs. The issue is resolved in version 0.3.1.
AI-Powered Analysis
Technical Analysis
CVE-2025-6211 is a medium-severity vulnerability affecting the run-llama/llama_index repository, specifically the DocugamiReader class up to version 0.12.28. The vulnerability arises from the use of MD5 hashing to generate unique identifiers (IDs) for document chunks. MD5, being a cryptographic hash function with known weaknesses, is used here to hash the text content of document chunks to create IDs. However, structurally distinct chunks that contain identical text produce identical MD5 hashes, leading to hash collisions. This collision causes one chunk to overwrite another in the system, resulting in the loss of semantically or legally important document content. Furthermore, this overwriting breaks the parent-child chunk hierarchies that are essential for maintaining the structural integrity of documents. The consequence is that AI models relying on these document chunks for generating responses may produce inaccurate or hallucinated outputs, undermining the reliability of AI-driven document processing or analysis. The vulnerability does not impact confidentiality but affects integrity and availability of document data. The issue has been addressed in version 0.3.1 of the software, which presumably replaces the MD5 hashing mechanism with a more collision-resistant method or a different approach to ID generation. The CVSS score is 6.5 (medium), reflecting the network exploitable nature without privileges or user interaction, and the impact on integrity and availability but not confidentiality. No known exploits are reported in the wild as of the publication date.
Potential Impact
For European organizations, especially those relying on AI-driven document processing tools like run-llama/llama_index, this vulnerability can lead to significant operational and legal risks. Loss or corruption of document chunks can result in incomplete or misleading data being fed into AI models, causing inaccurate outputs that may affect decision-making, compliance reporting, or legal document handling. Sectors such as legal, financial services, healthcare, and government agencies that process sensitive or regulated documents are particularly at risk. The integrity breach could undermine trust in automated document analysis, potentially leading to regulatory non-compliance or erroneous business outcomes. Additionally, the disruption of document hierarchies may complicate audits or forensic investigations. While the vulnerability does not directly expose confidential data, the loss of data integrity and availability can have cascading effects on business processes and AI reliability.
Mitigation Recommendations
European organizations using run-llama/llama_index should immediately upgrade to version 0.3.1 or later, where the vulnerability is fixed. If upgrading is not immediately feasible, organizations should consider implementing additional validation layers to detect and handle hash collisions, such as appending structural metadata to the hash input or switching to a collision-resistant hashing algorithm like SHA-256 for ID generation. It is also advisable to audit existing document chunk data for signs of overwriting or data loss and to maintain robust backups of original documents. Organizations should monitor AI output quality closely for hallucinations or inaccuracies that may indicate underlying data integrity issues. Finally, integrating integrity checks and version control mechanisms for document chunks can help detect and prevent silent data corruption.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2025-06-17T17:36:01.333Z
- Cvss Version
- 3.0
- State
- PUBLISHED
Threat ID: 686fbd17a83201eaaca7d1d9
Added to database: 7/10/2025, 1:16:07 PM
Last enriched: 7/10/2025, 1:31:20 PM
Last updated: 7/13/2025, 6:39:40 PM
Views: 17
Related Threats
CVE-2025-7628: Path Traversal in YiJiuSmile kkFileViewOfficeEdit
MediumCVE-2025-7627: Unrestricted Upload in YiJiuSmile kkFileViewOfficeEdit
MediumCVE-2025-52363: n/a
HighCVE-2025-7626: Path Traversal in YiJiuSmile kkFileViewOfficeEdit
MediumCVE-2025-51660: n/a
HighActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.