CVE-2026-28350: CWE-116: Improper Encoding or Escaping of Output in fedora-python lxml_html_clean
lxml_html_clean is a project for HTML cleaning functionalities copied from `lxml.html.clean`. Prior to version 0.4.4, the <base> tag passes through the default Cleaner configuration. While page_structure=True removes html, head, and title tags, there is no specific handling for <base>, allowing an attacker to inject it and hijack relative links on the page. This issue has been patched in version 0.4.4.
AI Analysis
Technical Summary
The vulnerability identified as CVE-2026-28350 affects the lxml_html_clean project, a Python library used for cleaning HTML content. This library is a fork or copy of lxml.html.clean and provides functionalities to sanitize HTML input. Prior to version 0.4.4, the default Cleaner configuration does not specifically handle the <base> HTML tag. While other structural tags such as <html>, <head>, and <title> are removed when the page_structure option is enabled, the <base> tag is allowed to pass through unfiltered. The <base> tag in HTML defines the base URL for all relative URLs in the document. An attacker can exploit this by injecting a malicious <base> tag, causing all relative links on the page to redirect to attacker-controlled URLs. This improper encoding or escaping of output corresponds to CWE-116, which involves failure to properly sanitize or encode output data. The vulnerability impacts confidentiality and integrity by enabling link hijacking, potentially leading to phishing or redirection attacks. The CVSS v3.1 score is 6.1 (medium severity), reflecting network attack vector, low attack complexity, no privileges required, but user interaction needed, and a scope change due to affecting linked resources. The vulnerability was published on March 5, 2026, and has been patched in lxml_html_clean version 0.4.4. There are no known exploits in the wild at this time.
Potential Impact
This vulnerability can lead to attackers injecting malicious <base> tags into HTML content cleaned by vulnerable versions of lxml_html_clean, causing all relative links on the page to redirect to attacker-controlled sites. This can facilitate phishing attacks, credential theft, or distribution of malware by misleading users about link destinations. Organizations that rely on lxml_html_clean for sanitizing user-generated or external HTML content may inadvertently expose their users to these risks. The impact is primarily on confidentiality and integrity, as attackers can manipulate link targets without disrupting availability. Since exploitation requires user interaction (clicking links), the risk is somewhat mitigated but still significant, especially for web applications that sanitize and display untrusted HTML content. The vulnerability could affect web portals, content management systems, and any Python-based services using this library. The absence of known exploits in the wild suggests limited current exploitation, but the vulnerability should be addressed proactively to prevent future attacks.
Mitigation Recommendations
The primary mitigation is to upgrade lxml_html_clean to version 0.4.4 or later, where the <base> tag is properly handled and sanitized. For organizations unable to upgrade immediately, implement additional HTML sanitization layers that explicitly remove or neutralize <base> tags before or after using lxml_html_clean. Review and restrict the sources of HTML content being cleaned to trusted inputs where possible. Employ Content Security Policy (CSP) headers to limit the impact of malicious redirects and reduce the risk of phishing. Educate users to be cautious with links, especially those that appear suspicious or redirect unexpectedly. Monitor web application logs for unusual link redirection patterns that may indicate exploitation attempts. Finally, integrate automated security testing to detect improper HTML sanitization in development pipelines.
Affected Countries
United States, Germany, United Kingdom, France, Japan, Canada, Australia, India, Brazil, South Korea
CVE-2026-28350: CWE-116: Improper Encoding or Escaping of Output in fedora-python lxml_html_clean
Description
lxml_html_clean is a project for HTML cleaning functionalities copied from `lxml.html.clean`. Prior to version 0.4.4, the <base> tag passes through the default Cleaner configuration. While page_structure=True removes html, head, and title tags, there is no specific handling for <base>, allowing an attacker to inject it and hijack relative links on the page. This issue has been patched in version 0.4.4.
AI-Powered Analysis
Machine-generated threat intelligence
Technical Analysis
The vulnerability identified as CVE-2026-28350 affects the lxml_html_clean project, a Python library used for cleaning HTML content. This library is a fork or copy of lxml.html.clean and provides functionalities to sanitize HTML input. Prior to version 0.4.4, the default Cleaner configuration does not specifically handle the <base> HTML tag. While other structural tags such as <html>, <head>, and <title> are removed when the page_structure option is enabled, the <base> tag is allowed to pass through unfiltered. The <base> tag in HTML defines the base URL for all relative URLs in the document. An attacker can exploit this by injecting a malicious <base> tag, causing all relative links on the page to redirect to attacker-controlled URLs. This improper encoding or escaping of output corresponds to CWE-116, which involves failure to properly sanitize or encode output data. The vulnerability impacts confidentiality and integrity by enabling link hijacking, potentially leading to phishing or redirection attacks. The CVSS v3.1 score is 6.1 (medium severity), reflecting network attack vector, low attack complexity, no privileges required, but user interaction needed, and a scope change due to affecting linked resources. The vulnerability was published on March 5, 2026, and has been patched in lxml_html_clean version 0.4.4. There are no known exploits in the wild at this time.
Potential Impact
This vulnerability can lead to attackers injecting malicious <base> tags into HTML content cleaned by vulnerable versions of lxml_html_clean, causing all relative links on the page to redirect to attacker-controlled sites. This can facilitate phishing attacks, credential theft, or distribution of malware by misleading users about link destinations. Organizations that rely on lxml_html_clean for sanitizing user-generated or external HTML content may inadvertently expose their users to these risks. The impact is primarily on confidentiality and integrity, as attackers can manipulate link targets without disrupting availability. Since exploitation requires user interaction (clicking links), the risk is somewhat mitigated but still significant, especially for web applications that sanitize and display untrusted HTML content. The vulnerability could affect web portals, content management systems, and any Python-based services using this library. The absence of known exploits in the wild suggests limited current exploitation, but the vulnerability should be addressed proactively to prevent future attacks.
Mitigation Recommendations
The primary mitigation is to upgrade lxml_html_clean to version 0.4.4 or later, where the <base> tag is properly handled and sanitized. For organizations unable to upgrade immediately, implement additional HTML sanitization layers that explicitly remove or neutralize <base> tags before or after using lxml_html_clean. Review and restrict the sources of HTML content being cleaned to trusted inputs where possible. Employ Content Security Policy (CSP) headers to limit the impact of malicious redirects and reduce the risk of phishing. Educate users to be cautious with links, especially those that appear suspicious or redirect unexpectedly. Monitor web application logs for unusual link redirection patterns that may indicate exploitation attempts. Finally, integrate automated security testing to detect improper HTML sanitization in development pipelines.
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- GitHub_M
- Date Reserved
- 2026-02-26T18:38:13.890Z
- Cvss Version
- 3.1
- State
- PUBLISHED
Threat ID: 69a9e2f561e8e69ef5e92412
Added to database: 3/5/2026, 8:09:25 PM
Last enriched: 3/12/2026, 8:26:31 PM
Last updated: 4/19/2026, 6:16:05 AM
Views: 66
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Actions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.
Latest Threats
Check if your credentials are on the dark web
Instant breach scanning across billions of leaked records. Free tier available.