CVE-2026-0847: CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') in nltk nltk/nltk
A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attackers to traverse directories and access sensitive files on the server. This issue is particularly critical in scenarios where user-controlled file inputs are processed, such as in machine learning APIs, chatbots, or NLP pipelines. Exploitation of this vulnerability can lead to unauthorized access to sensitive files, including system files, SSH private keys, and API tokens, and may potentially escalate to remote code execution when combined with other vulnerabilities.
AI Analysis
Technical Summary
CVE-2026-0847 is a path traversal vulnerability classified under CWE-22, affecting the NLTK library, a widely used Python toolkit for natural language processing. The flaw exists in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader, which do not adequately validate or sanitize file path inputs. This improper limitation of pathname allows an attacker to craft malicious file paths that traverse directories beyond the intended corpus directories. As a result, attackers can read arbitrary files on the host system, potentially accessing sensitive information such as system configuration files, SSH private keys, and API tokens. The vulnerability requires no authentication and no user interaction, and it can be exploited remotely if the vulnerable NLTK components process user-supplied file paths, common in machine learning APIs, chatbots, or NLP pipelines. While no public exploits have been reported yet, the CVSS score of 8.6 (high) reflects the critical impact on confidentiality, with some impact on integrity and availability. The vulnerability could also serve as a stepping stone for remote code execution if combined with other vulnerabilities. The lack of official patches at the time of disclosure necessitates immediate mitigation efforts by users of affected NLTK versions.
Potential Impact
The primary impact of CVE-2026-0847 is unauthorized disclosure of sensitive information due to arbitrary file read capabilities. Organizations using vulnerable NLTK versions in production environments risk exposure of critical files such as system credentials, private keys, and API tokens, which can lead to further compromise of infrastructure and data breaches. In environments where NLP services are exposed to external inputs, attackers can exploit this vulnerability remotely without authentication, increasing the attack surface. The potential for escalation to remote code execution, while indirect, raises the stakes significantly. This vulnerability undermines confidentiality and could disrupt service availability if exploited to access or manipulate critical system files. Organizations relying on NLTK for machine learning, chatbots, or data processing pipelines face risks of data leakage, loss of trust, regulatory non-compliance, and operational disruption.
Mitigation Recommendations
1. Upgrade NLTK to a version that addresses this vulnerability once an official patch is released. 2. Until patched, implement strict input validation and sanitization on all user-supplied file paths before passing them to NLTK CorpusReader classes. 3. Employ application-layer whitelisting to restrict file access strictly to intended corpus directories. 4. Use containerization or sandboxing to isolate NLP processing environments, limiting the impact of potential exploitation. 5. Monitor logs for unusual file access patterns indicative of path traversal attempts. 6. Restrict file system permissions for the application user to minimize access to sensitive files outside the corpus directories. 7. Consider disabling or restricting features that accept external file paths if not essential. 8. Conduct security reviews and penetration testing focused on file path handling in NLP-related services. These steps go beyond generic advice by focusing on containment, input validation, and environment hardening specific to the vulnerability context.
Affected Countries
United States, India, China, Germany, United Kingdom, Canada, Australia, France, Japan, South Korea
CVE-2026-0847: CWE-22 Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') in nltk nltk/nltk
Description
A vulnerability in NLTK versions up to and including 3.9.2 allows arbitrary file read via path traversal in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader. These classes fail to properly sanitize or validate file paths, enabling attackers to traverse directories and access sensitive files on the server. This issue is particularly critical in scenarios where user-controlled file inputs are processed, such as in machine learning APIs, chatbots, or NLP pipelines. Exploitation of this vulnerability can lead to unauthorized access to sensitive files, including system files, SSH private keys, and API tokens, and may potentially escalate to remote code execution when combined with other vulnerabilities.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
CVE-2026-0847 is a path traversal vulnerability classified under CWE-22, affecting the NLTK library, a widely used Python toolkit for natural language processing. The flaw exists in multiple CorpusReader classes, including WordListCorpusReader, TaggedCorpusReader, and BracketParseCorpusReader, which do not adequately validate or sanitize file path inputs. This improper limitation of pathname allows an attacker to craft malicious file paths that traverse directories beyond the intended corpus directories. As a result, attackers can read arbitrary files on the host system, potentially accessing sensitive information such as system configuration files, SSH private keys, and API tokens. The vulnerability requires no authentication and no user interaction, and it can be exploited remotely if the vulnerable NLTK components process user-supplied file paths, common in machine learning APIs, chatbots, or NLP pipelines. While no public exploits have been reported yet, the CVSS score of 8.6 (high) reflects the critical impact on confidentiality, with some impact on integrity and availability. The vulnerability could also serve as a stepping stone for remote code execution if combined with other vulnerabilities. The lack of official patches at the time of disclosure necessitates immediate mitigation efforts by users of affected NLTK versions.
Potential Impact
The primary impact of CVE-2026-0847 is unauthorized disclosure of sensitive information due to arbitrary file read capabilities. Organizations using vulnerable NLTK versions in production environments risk exposure of critical files such as system credentials, private keys, and API tokens, which can lead to further compromise of infrastructure and data breaches. In environments where NLP services are exposed to external inputs, attackers can exploit this vulnerability remotely without authentication, increasing the attack surface. The potential for escalation to remote code execution, while indirect, raises the stakes significantly. This vulnerability undermines confidentiality and could disrupt service availability if exploited to access or manipulate critical system files. Organizations relying on NLTK for machine learning, chatbots, or data processing pipelines face risks of data leakage, loss of trust, regulatory non-compliance, and operational disruption.
Mitigation Recommendations
1. Upgrade NLTK to a version that addresses this vulnerability once an official patch is released. 2. Until patched, implement strict input validation and sanitization on all user-supplied file paths before passing them to NLTK CorpusReader classes. 3. Employ application-layer whitelisting to restrict file access strictly to intended corpus directories. 4. Use containerization or sandboxing to isolate NLP processing environments, limiting the impact of potential exploitation. 5. Monitor logs for unusual file access patterns indicative of path traversal attempts. 6. Restrict file system permissions for the application user to minimize access to sensitive files outside the corpus directories. 7. Consider disabling or restricting features that accept external file paths if not essential. 8. Conduct security reviews and penetration testing focused on file path handling in NLP-related services. These steps go beyond generic advice by focusing on containment, input validation, and environment hardening specific to the vulnerability context.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2026-01-10T23:57:44.460Z
- Cvss Version
- 3.0
- State
- PUBLISHED
Threat ID: 69a87af1d1a09e29cb563d89
Added to database: 3/4/2026, 6:33:21 PM
Last enriched: 3/11/2026, 8:23:06 PM
Last updated: 4/18/2026, 7:55:49 PM
Views: 123
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
External Links
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.