CVE-2025-6985: CWE-611 Improper Restriction of XML External Entity Reference in langchain-ai langchain-ai/langchain
The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.
AI Analysis
Technical Summary
CVE-2025-6985 is an XML External Entity (XXE) vulnerability classified under CWE-611, found in the HTMLSectionSplitter class of the langchain-ai/langchain project, specifically in version 0.3.8 of langchain-text-splitters. The root cause is the unsafe parsing of arbitrary XSLT stylesheets using the lxml library's etree.parse() and etree.XSLT() functions without applying security controls such as XSLTAccessControl. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to exploit this behavior to read arbitrary local files or perform outbound HTTP(S) requests. In lxml 5.0 and later, although entity expansion is disabled by default, the XSLT document() function remains capable of reading any URI unless explicitly restricted. This vulnerability enables remote attackers to gain unauthorized read-only access to any file accessible by the LangChain process, including sensitive credentials (e.g., SSH private keys), environment configuration files, source code, or cloud metadata endpoints. Exploitation requires no authentication, special privileges, or user interaction, making it highly accessible in default deployments that allow custom XSLT input. The vulnerability has a CVSS 3.0 base score of 7.5, reflecting its high impact on confidentiality with no impact on integrity or availability. No public exploits have been reported yet, but the risk remains significant due to the nature of the vulnerability and the widespread use of LangChain in AI and data processing applications.
Potential Impact
For European organizations, this vulnerability poses a significant risk to confidentiality, potentially exposing sensitive internal files, credentials, and cloud metadata that could lead to further compromise or data breaches. Organizations using LangChain for AI workflows, document processing, or automation could have critical intellectual property, customer data, or operational secrets exposed. The ability to read cloud metadata is particularly concerning for deployments on public cloud platforms, as it could lead to credential theft and lateral movement within cloud environments. Since no authentication or user interaction is required, attackers can exploit this vulnerability remotely if the service accepts untrusted XSLT inputs. This could impact sectors with high data sensitivity such as finance, healthcare, and government agencies across Europe. Additionally, exposure of SSH keys or environment variables could facilitate persistent access or privilege escalation. The vulnerability undermines trust in AI and automation pipelines that rely on LangChain, potentially disrupting digital transformation initiatives.
Mitigation Recommendations
European organizations should immediately audit their use of langchain-ai/langchain, especially the HTMLSectionSplitter class and any features that allow custom XSLT stylesheets. Mitigation steps include: 1) Upgrading to a patched version of langchain-text-splitters once available or applying vendor-provided patches. 2) If upgrading is not immediately possible, disable or restrict the use of custom XSLT inputs to trusted sources only. 3) Implement XSLTAccessControl or equivalent hardening when using lxml.etree.XSLT to restrict document() function access and external entity resolution. 4) Employ runtime security controls such as container or process-level file system restrictions to limit file access by LangChain processes. 5) Monitor logs for unusual file access patterns or outbound HTTP(S) requests initiated by LangChain components. 6) Conduct internal code reviews and penetration tests focusing on XML and XSLT processing components. 7) Educate developers and DevOps teams about the risks of unsafe XML/XSLT parsing and enforce secure coding practices. 8) Consider network-level controls to restrict outbound HTTP(S) traffic from LangChain hosts to prevent data exfiltration via document().
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Denmark, Ireland, Belgium, Italy
CVE-2025-6985: CWE-611 Improper Restriction of XML External Entity Reference in langchain-ai langchain-ai/langchain
Description
The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 is vulnerable to XML External Entity (XXE) attacks due to unsafe XSLT parsing. This vulnerability arises because the class allows the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to read arbitrary local files or perform outbound HTTP(S) fetches. In lxml versions 5.0 and above, while entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl is applied. This vulnerability allows remote attackers to gain read-only access to any file the LangChain process can reach, including sensitive files such as SSH keys, environment files, source code, or cloud metadata. No authentication, special privileges, or user interaction are required, and the issue is exploitable in default deployments that enable custom XSLT.
AI-Powered Analysis
Technical Analysis
CVE-2025-6985 is an XML External Entity (XXE) vulnerability classified under CWE-611, found in the HTMLSectionSplitter class of the langchain-ai/langchain project, specifically in version 0.3.8 of langchain-text-splitters. The root cause is the unsafe parsing of arbitrary XSLT stylesheets using the lxml library's etree.parse() and etree.XSLT() functions without applying security controls such as XSLTAccessControl. In lxml versions up to 4.9.x, external entities are resolved by default, allowing attackers to exploit this behavior to read arbitrary local files or perform outbound HTTP(S) requests. In lxml 5.0 and later, although entity expansion is disabled by default, the XSLT document() function remains capable of reading any URI unless explicitly restricted. This vulnerability enables remote attackers to gain unauthorized read-only access to any file accessible by the LangChain process, including sensitive credentials (e.g., SSH private keys), environment configuration files, source code, or cloud metadata endpoints. Exploitation requires no authentication, special privileges, or user interaction, making it highly accessible in default deployments that allow custom XSLT input. The vulnerability has a CVSS 3.0 base score of 7.5, reflecting its high impact on confidentiality with no impact on integrity or availability. No public exploits have been reported yet, but the risk remains significant due to the nature of the vulnerability and the widespread use of LangChain in AI and data processing applications.
Potential Impact
For European organizations, this vulnerability poses a significant risk to confidentiality, potentially exposing sensitive internal files, credentials, and cloud metadata that could lead to further compromise or data breaches. Organizations using LangChain for AI workflows, document processing, or automation could have critical intellectual property, customer data, or operational secrets exposed. The ability to read cloud metadata is particularly concerning for deployments on public cloud platforms, as it could lead to credential theft and lateral movement within cloud environments. Since no authentication or user interaction is required, attackers can exploit this vulnerability remotely if the service accepts untrusted XSLT inputs. This could impact sectors with high data sensitivity such as finance, healthcare, and government agencies across Europe. Additionally, exposure of SSH keys or environment variables could facilitate persistent access or privilege escalation. The vulnerability undermines trust in AI and automation pipelines that rely on LangChain, potentially disrupting digital transformation initiatives.
Mitigation Recommendations
European organizations should immediately audit their use of langchain-ai/langchain, especially the HTMLSectionSplitter class and any features that allow custom XSLT stylesheets. Mitigation steps include: 1) Upgrading to a patched version of langchain-text-splitters once available or applying vendor-provided patches. 2) If upgrading is not immediately possible, disable or restrict the use of custom XSLT inputs to trusted sources only. 3) Implement XSLTAccessControl or equivalent hardening when using lxml.etree.XSLT to restrict document() function access and external entity resolution. 4) Employ runtime security controls such as container or process-level file system restrictions to limit file access by LangChain processes. 5) Monitor logs for unusual file access patterns or outbound HTTP(S) requests initiated by LangChain components. 6) Conduct internal code reviews and penetration tests focusing on XML and XSLT processing components. 7) Educate developers and DevOps teams about the risks of unsafe XML/XSLT parsing and enforce secure coding practices. 8) Consider network-level controls to restrict outbound HTTP(S) traffic from LangChain hosts to prevent data exfiltration via document().
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2025-07-01T20:33:58.220Z
- Cvss Version
- 3.0
- State
- PUBLISHED
Threat ID: 68e405fa64f972a16d673f26
Added to database: 10/6/2025, 6:10:02 PM
Last enriched: 10/6/2025, 6:17:49 PM
Last updated: 10/7/2025, 1:13:16 PM
Views: 13
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2025-40889: CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') in Nozomi Networks Guardian
HighCVE-2025-40888: CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in Nozomi Networks Guardian
MediumCVE-2025-40887: CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in Nozomi Networks Guardian
MediumCVE-2025-40886: CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in Nozomi Networks Guardian
HighCVE-2025-40885: CWE-89 Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection') in Nozomi Networks Guardian
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.