CVE-2024-0243: CWE-918 Server-Side Request Forgery (SSRF) in langchain-ai langchain-ai/langchain
With the following crawler configuration: ```python from bs4 import BeautifulSoup as Soup url = "https://example.com" loader = RecursiveUrlLoader( url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text ) docs = loader.load() ``` An attacker in control of the contents of `https://example.com` could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though `prevent_outside=True`. https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51 Resolved in https://github.com/langchain-ai/langchain/pull/15559
AI Analysis
Technical Summary
CVE-2024-0243 is a Server-Side Request Forgery (SSRF) vulnerability identified in the langchain-ai/langchain project, specifically within the RecursiveUrlLoader component used for crawling web content. The vulnerability arises from improper validation of URLs during recursive crawling operations. In the provided example, the RecursiveUrlLoader is configured to crawl starting from a URL (e.g., https://example.com) with a maximum depth of 2 and uses BeautifulSoup to extract text content. However, an attacker who controls the content served at the initial URL can embed malicious HTML containing links to external domains (e.g., https://example.completely.different/my_file.html). Despite the presence of a parameter intended to prevent crawling outside the original domain (prevent_outside=True), the loader erroneously follows these external links and downloads their content. This behavior enables an attacker to coerce the server running the langchain crawler to make arbitrary HTTP requests to external or internal network resources, potentially bypassing network access controls or firewall restrictions. The vulnerability is classified under CWE-918 (Server-Side Request Forgery), which typically allows attackers to induce the server to interact with unintended locations, possibly exposing sensitive internal services or data. The issue was identified in the RecursiveUrlLoader implementation, with the relevant code located in the langchain_community/document_loaders/recursive_url_loader.py file. A fix has been merged in pull request 15559 on the project's GitHub repository, though no official patch release or CVSS score has been published yet. There are no known exploits in the wild at this time. The affected versions are unspecified, implying that any version using the vulnerable RecursiveUrlLoader without the fix may be impacted.
Potential Impact
For European organizations utilizing langchain-ai/langchain, especially those deploying the RecursiveUrlLoader for web crawling or document ingestion, this SSRF vulnerability poses a significant risk. An attacker controlling or influencing the initial URL content can exploit the vulnerability to make the server perform unauthorized requests to internal or external systems. This can lead to several impacts: 1) Confidentiality: Attackers may access internal-only services or metadata endpoints not exposed externally, potentially leaking sensitive information. 2) Integrity: By forcing requests to internal APIs or services, attackers might trigger unintended actions or data modifications if those services lack proper authentication. 3) Availability: SSRF can be leveraged to perform denial-of-service attacks on internal resources by overwhelming them with requests. Given that langchain is often used in AI and data processing pipelines, exploitation could compromise the integrity of data inputs or expose private datasets. The medium severity rating reflects that exploitation requires control over the initial URL content, which may limit attack vectors but does not eliminate risk, especially in environments where user-supplied URLs are ingested. The lack of authentication or user interaction requirements for the SSRF to occur increases the threat level. European organizations in sectors with sensitive internal networks (e.g., finance, healthcare, government) are particularly at risk if they use vulnerable versions of langchain in their infrastructure.
Mitigation Recommendations
1) Immediate Upgrade: Organizations should update langchain-ai/langchain to the latest version containing the fix merged in pull request 15559, ensuring the RecursiveUrlLoader properly enforces domain restrictions. 2) Input Validation: Implement strict validation and sanitization of URLs before passing them to the RecursiveUrlLoader, limiting crawling to trusted domains only. 3) Network Segmentation: Restrict the network environment where langchain runs, preventing it from accessing sensitive internal services or metadata endpoints that could be targeted by SSRF. 4) Monitoring and Logging: Enable detailed logging of all outbound HTTP requests initiated by langchain processes to detect anomalous or unexpected external requests. 5) Use of Web Application Firewalls (WAFs): Deploy WAFs or proxy solutions that can detect and block SSRF patterns or unusual request destinations originating from internal servers. 6) Least Privilege: Run langchain services with minimal network privileges and avoid running them in environments with broad internal network access. 7) Security Testing: Incorporate SSRF-specific tests in the CI/CD pipeline to detect regressions or new vulnerabilities in recursive URL loading features. These measures go beyond generic advice by focusing on controlling the crawler’s input, restricting network access, and enhancing detection capabilities.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland, Belgium, Italy
CVE-2024-0243: CWE-918 Server-Side Request Forgery (SSRF) in langchain-ai langchain-ai/langchain
Description
With the following crawler configuration: ```python from bs4 import BeautifulSoup as Soup url = "https://example.com" loader = RecursiveUrlLoader( url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text ) docs = loader.load() ``` An attacker in control of the contents of `https://example.com` could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though `prevent_outside=True`. https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51 Resolved in https://github.com/langchain-ai/langchain/pull/15559
AI-Powered Analysis
Technical Analysis
CVE-2024-0243 is a Server-Side Request Forgery (SSRF) vulnerability identified in the langchain-ai/langchain project, specifically within the RecursiveUrlLoader component used for crawling web content. The vulnerability arises from improper validation of URLs during recursive crawling operations. In the provided example, the RecursiveUrlLoader is configured to crawl starting from a URL (e.g., https://example.com) with a maximum depth of 2 and uses BeautifulSoup to extract text content. However, an attacker who controls the content served at the initial URL can embed malicious HTML containing links to external domains (e.g., https://example.completely.different/my_file.html). Despite the presence of a parameter intended to prevent crawling outside the original domain (prevent_outside=True), the loader erroneously follows these external links and downloads their content. This behavior enables an attacker to coerce the server running the langchain crawler to make arbitrary HTTP requests to external or internal network resources, potentially bypassing network access controls or firewall restrictions. The vulnerability is classified under CWE-918 (Server-Side Request Forgery), which typically allows attackers to induce the server to interact with unintended locations, possibly exposing sensitive internal services or data. The issue was identified in the RecursiveUrlLoader implementation, with the relevant code located in the langchain_community/document_loaders/recursive_url_loader.py file. A fix has been merged in pull request 15559 on the project's GitHub repository, though no official patch release or CVSS score has been published yet. There are no known exploits in the wild at this time. The affected versions are unspecified, implying that any version using the vulnerable RecursiveUrlLoader without the fix may be impacted.
Potential Impact
For European organizations utilizing langchain-ai/langchain, especially those deploying the RecursiveUrlLoader for web crawling or document ingestion, this SSRF vulnerability poses a significant risk. An attacker controlling or influencing the initial URL content can exploit the vulnerability to make the server perform unauthorized requests to internal or external systems. This can lead to several impacts: 1) Confidentiality: Attackers may access internal-only services or metadata endpoints not exposed externally, potentially leaking sensitive information. 2) Integrity: By forcing requests to internal APIs or services, attackers might trigger unintended actions or data modifications if those services lack proper authentication. 3) Availability: SSRF can be leveraged to perform denial-of-service attacks on internal resources by overwhelming them with requests. Given that langchain is often used in AI and data processing pipelines, exploitation could compromise the integrity of data inputs or expose private datasets. The medium severity rating reflects that exploitation requires control over the initial URL content, which may limit attack vectors but does not eliminate risk, especially in environments where user-supplied URLs are ingested. The lack of authentication or user interaction requirements for the SSRF to occur increases the threat level. European organizations in sectors with sensitive internal networks (e.g., finance, healthcare, government) are particularly at risk if they use vulnerable versions of langchain in their infrastructure.
Mitigation Recommendations
1) Immediate Upgrade: Organizations should update langchain-ai/langchain to the latest version containing the fix merged in pull request 15559, ensuring the RecursiveUrlLoader properly enforces domain restrictions. 2) Input Validation: Implement strict validation and sanitization of URLs before passing them to the RecursiveUrlLoader, limiting crawling to trusted domains only. 3) Network Segmentation: Restrict the network environment where langchain runs, preventing it from accessing sensitive internal services or metadata endpoints that could be targeted by SSRF. 4) Monitoring and Logging: Enable detailed logging of all outbound HTTP requests initiated by langchain processes to detect anomalous or unexpected external requests. 5) Use of Web Application Firewalls (WAFs): Deploy WAFs or proxy solutions that can detect and block SSRF patterns or unusual request destinations originating from internal servers. 6) Least Privilege: Run langchain services with minimal network privileges and avoid running them in environments with broad internal network access. 7) Security Testing: Incorporate SSRF-specific tests in the CI/CD pipeline to detect regressions or new vulnerabilities in recursive URL loading features. These measures go beyond generic advice by focusing on controlling the crawler’s input, restricting network access, and enhancing detection capabilities.
Affected Countries
For access to advanced analysis and higher rate limits, contact root@offseq.com
Technical Details
- Data Version
- 5.1
- Assigner Short Name
- @huntr_ai
- Date Reserved
- 2024-01-04T21:47:13.281Z
- Cisa Enriched
- true
Threat ID: 682d9849c4522896dcbf6b9e
Added to database: 5/21/2025, 9:09:29 AM
Last enriched: 6/21/2025, 9:56:25 PM
Last updated: 7/31/2025, 12:12:40 AM
Views: 9
Related Threats
CVE-2025-8878: CWE-94 Improper Control of Generation of Code ('Code Injection') in properfraction Paid Membership Plugin, Ecommerce, User Registration Form, Login Form, User Profile & Restrict Content – ProfilePress
MediumCVE-2025-8143: CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in pencidesign Soledad
MediumCVE-2025-8142: CWE-98 Improper Control of Filename for Include/Require Statement in PHP Program ('PHP Remote File Inclusion') in pencidesign Soledad
HighCVE-2025-8105: CWE-94 Improper Control of Generation of Code ('Code Injection') in pencidesign Soledad
HighCVE-2025-8719: CWE-79 Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in reubenthiessen Translate This gTranslate Shortcode
MediumActions
Updates to AI analysis are available only with a Pro account. Contact root@offseq.com for access.
External Links
Need enhanced features?
Contact root@offseq.com for Pro access with improved analysis and higher rate limits.