CVE-2026-40682: CWE-611 Improper Restriction of XML External Entity Reference in Apache Software Foundation Apache OpenNLP
XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor Versions Affected: before 2.5.9, before 3.0.0-M3 Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario. Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.
AI Analysis
Technical Summary
Apache OpenNLP versions before 2.5.9 and 3.0.0-M3 contain an XXE vulnerability in the DictionaryEntryPersistor class. This class initializes a SAXParserFactory without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing, leaving external entity resolution and DOCTYPE declarations enabled. When the create(InputStream, EntryInserter) method is called, an attacker supplying a crafted dictionary XML file with malicious DOCTYPE declarations can exploit this to perform local file disclosure or server-side request forgery during SAX parsing. This vulnerability affects the public Dictionary(InputStream) constructor, which is the documented API for loading user dictionaries, making untrusted input a realistic attack vector. The vulnerability is inconsistent with other XML parsing in the project, which correctly disables these features. The vendor recommends upgrading to versions 2.5.9 or 3.0.0-M3 to remediate the issue.
Potential Impact
An attacker able to supply a crafted dictionary XML file to the vulnerable Apache OpenNLP versions can exploit this XXE vulnerability to read local files on the server or cause server-side request forgery. This can lead to unauthorized disclosure of sensitive information or interaction with internal network resources. The impact is limited to scenarios where untrusted dictionary files are processed by the vulnerable API.
Mitigation Recommendations
A fixed version is available: upgrade to Apache OpenNLP 2.5.9 or 3.0.0-M3. If upgrading immediately is not possible, ensure all dictionary files are sourced from trusted origins. Additionally, implement input validation to reject any XML containing DOCTYPE declarations before it is parsed by the Dictionary(InputStream) constructor. These mitigations reduce the risk of exploitation until an upgrade can be performed.
CVE-2026-40682: CWE-611 Improper Restriction of XML External Entity Reference in Apache Software Foundation Apache OpenNLP
Description
XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP DictionaryEntryPersistor Versions Affected: before 2.5.9, before 3.0.0-M3 Description: The DictionaryEntryPersistor class initializes a static SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing. When create(InputStream, EntryInserter) is invoked, the only feature set on the XMLReader is namespace support — external entity resolution and DOCTYPE declarations remain fully enabled. An attacker who can supply a crafted dictionary file (e.g., a stop-word list or domain dictionary) containing a malicious DOCTYPE declaration can trigger local file disclosure via file:// entity references or server-side request forgery via http:// entity references during SAX parsing, before the application processes a single dictionary entry. This is inconsistent with the project's own XmlUtil.createSaxParser() helper, which correctly sets FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other XML parsing paths in the codebase. The public Dictionary(InputStream) constructor delegates directly to this method and is the documented API for loading user-supplied dictionaries, making untrusted input a realistic scenario. Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 3.0.0-M3. Users who cannot upgrade immediately should ensure that all dictionary files are sourced from trusted origins and should consider wrapping the Dictionary(InputStream) constructor with input validation that rejects any XML containing a DOCTYPE declaration before it reaches the parser.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
Apache OpenNLP versions before 2.5.9 and 3.0.0-M3 contain an XXE vulnerability in the DictionaryEntryPersistor class. This class initializes a SAXParserFactory without enabling FEATURE_SECURE_PROCESSING or disabling DTD processing, leaving external entity resolution and DOCTYPE declarations enabled. When the create(InputStream, EntryInserter) method is called, an attacker supplying a crafted dictionary XML file with malicious DOCTYPE declarations can exploit this to perform local file disclosure or server-side request forgery during SAX parsing. This vulnerability affects the public Dictionary(InputStream) constructor, which is the documented API for loading user dictionaries, making untrusted input a realistic attack vector. The vulnerability is inconsistent with other XML parsing in the project, which correctly disables these features. The vendor recommends upgrading to versions 2.5.9 or 3.0.0-M3 to remediate the issue.
Potential Impact
An attacker able to supply a crafted dictionary XML file to the vulnerable Apache OpenNLP versions can exploit this XXE vulnerability to read local files on the server or cause server-side request forgery. This can lead to unauthorized disclosure of sensitive information or interaction with internal network resources. The impact is limited to scenarios where untrusted dictionary files are processed by the vulnerable API.
Mitigation Recommendations
A fixed version is available: upgrade to Apache OpenNLP 2.5.9 or 3.0.0-M3. If upgrading immediately is not possible, ensure all dictionary files are sourced from trusted origins. Additionally, implement input validation to reject any XML containing DOCTYPE declarations before it is parsed by the Dictionary(InputStream) constructor. These mitigations reduce the risk of exploitation until an upgrade can be performed.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- apache
- Date Reserved
- 2026-04-14T17:21:09.189Z
- Cvss Version
- null
- State
- PUBLISHED
- Remediation Level
- null
Threat ID: 69f8d216cbff5d8610397041
Added to database: 5/4/2026, 5:06:30 PM
Last enriched: 5/4/2026, 5:22:40 PM
Last updated: 5/5/2026, 5:58:01 AM
Views: 3
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.