CVE-2024-14021: CWE-502 Deserialization of Untrusted Data in run-llama llama_index
LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk.
AI Analysis
Technical Summary
CVE-2024-14021 is a deserialization of untrusted data vulnerability (CWE-502) found in the run-llama project's llama_index library, specifically affecting versions up to and including 0.11.6. The issue exists in the BGEM3Index.load_from_disk() function located in llama_index/indices/managed/bge_m3/base.py, which uses Python's pickle.load() to deserialize a file named multi_embed_store.pkl from a user-supplied persist_dir. Because pickle.load() can execute arbitrary code during deserialization, if an attacker can control the contents of the persist_dir and provide a maliciously crafted pickle file, they can trigger arbitrary code execution on the victim's system when the index is loaded. This vulnerability does not require prior authentication but does require that the attacker can influence or supply the persist directory contents and that the victim loads the index, implying some user interaction. The CVSS 4.0 score is 8.4 (high severity), reflecting the potential for high confidentiality, integrity, and availability impact due to arbitrary code execution. No patches or fixes are currently linked, and no known exploits have been reported in the wild. The vulnerability is particularly relevant for environments where llama_index is used to manage or index data, especially in AI or machine learning workflows that rely on persistent storage of embeddings or indexes.
Potential Impact
For European organizations, this vulnerability poses a significant risk if they use the vulnerable versions of llama_index in their AI, data indexing, or machine learning pipelines. Successful exploitation could lead to arbitrary code execution, allowing attackers to compromise confidentiality by accessing sensitive data, integrity by modifying or corrupting data, and availability by disrupting services or deleting data. This could result in data breaches, operational disruptions, or further lateral movement within networks. Organizations in sectors such as finance, healthcare, research, and critical infrastructure that rely on AI tools and data indexing are particularly at risk. The requirement for supplying a malicious persist directory limits remote exploitation but does not eliminate risk in environments where untrusted data sources or shared storage are used. The absence of known exploits suggests a window for proactive mitigation before active attacks occur.
Mitigation Recommendations
European organizations should immediately audit their use of the llama_index library and identify any deployments using versions up to 0.11.6. Until a patch is available, they should avoid loading indexes from untrusted or user-supplied persist directories. Implement strict validation and sanitization of any input directories or files used for deserialization. Consider replacing pickle-based deserialization with safer alternatives such as JSON or other secure serialization libraries that do not allow code execution. Employ application-level controls to restrict which users or processes can supply or modify persist directories. Use endpoint protection and monitoring to detect anomalous file modifications or suspicious process executions related to llama_index. Maintain network segmentation to limit the impact of potential exploitation. Stay updated with vendor advisories for patches or updates addressing this vulnerability and apply them promptly once released.
Affected Countries
Germany, France, United Kingdom, Netherlands, Sweden, Finland
CVE-2024-14021: CWE-502 Deserialization of Untrusted Data in run-llama llama_index
Description
LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6 contain an unsafe deserialization vulnerability in BGEM3Index.load_from_disk() in llama_index/indices/managed/bge_m3/base.py. The function uses pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when the victim loads the index from disk.
AI-Powered Analysis
Technical Analysis
CVE-2024-14021 is a deserialization of untrusted data vulnerability (CWE-502) found in the run-llama project's llama_index library, specifically affecting versions up to and including 0.11.6. The issue exists in the BGEM3Index.load_from_disk() function located in llama_index/indices/managed/bge_m3/base.py, which uses Python's pickle.load() to deserialize a file named multi_embed_store.pkl from a user-supplied persist_dir. Because pickle.load() can execute arbitrary code during deserialization, if an attacker can control the contents of the persist_dir and provide a maliciously crafted pickle file, they can trigger arbitrary code execution on the victim's system when the index is loaded. This vulnerability does not require prior authentication but does require that the attacker can influence or supply the persist directory contents and that the victim loads the index, implying some user interaction. The CVSS 4.0 score is 8.4 (high severity), reflecting the potential for high confidentiality, integrity, and availability impact due to arbitrary code execution. No patches or fixes are currently linked, and no known exploits have been reported in the wild. The vulnerability is particularly relevant for environments where llama_index is used to manage or index data, especially in AI or machine learning workflows that rely on persistent storage of embeddings or indexes.
Potential Impact
For European organizations, this vulnerability poses a significant risk if they use the vulnerable versions of llama_index in their AI, data indexing, or machine learning pipelines. Successful exploitation could lead to arbitrary code execution, allowing attackers to compromise confidentiality by accessing sensitive data, integrity by modifying or corrupting data, and availability by disrupting services or deleting data. This could result in data breaches, operational disruptions, or further lateral movement within networks. Organizations in sectors such as finance, healthcare, research, and critical infrastructure that rely on AI tools and data indexing are particularly at risk. The requirement for supplying a malicious persist directory limits remote exploitation but does not eliminate risk in environments where untrusted data sources or shared storage are used. The absence of known exploits suggests a window for proactive mitigation before active attacks occur.
Mitigation Recommendations
European organizations should immediately audit their use of the llama_index library and identify any deployments using versions up to 0.11.6. Until a patch is available, they should avoid loading indexes from untrusted or user-supplied persist directories. Implement strict validation and sanitization of any input directories or files used for deserialization. Consider replacing pickle-based deserialization with safer alternatives such as JSON or other secure serialization libraries that do not allow code execution. Employ application-level controls to restrict which users or processes can supply or modify persist directories. Use endpoint protection and monitoring to detect anomalous file modifications or suspicious process executions related to llama_index. Maintain network segmentation to limit the impact of potential exploitation. Stay updated with vendor advisories for patches or updates addressing this vulnerability and apply them promptly once released.
Affected Countries
Technical Details
- Data Version
- 5.2
- Assigner Short Name
- VulnCheck
- Date Reserved
- 2026-01-09T20:42:56.495Z
- Cvss Version
- 4.0
- State
- PUBLISHED
Threat ID: 69658281da2266e838450d16
Added to database: 1/12/2026, 11:23:45 PM
Last enriched: 1/12/2026, 11:38:32 PM
Last updated: 1/13/2026, 1:30:59 AM
Views: 9
Community Reviews
0 reviewsCrowdsource mitigation strategies, share intel context, and vote on the most helpful responses. Sign in to add your voice and help keep defenders ahead.
Want to contribute mitigation steps or threat intel context? Sign in or create an account to join the community discussion.
Related Threats
CVE-2026-22214: CWE-121 Stack-based Buffer Overflow in RIOT RIOT OS
MediumCVE-2026-22213: CWE-121 Stack-based Buffer Overflow in RIOT RIOT OS
LowCVE-2024-58340: CWE-1333 Inefficient Regular Expression Complexity in LangChain AI LangChain
HighCVE-2024-58339: CWE-770 Allocation of Resources Without Limits or Throttling in run-llama llama_index
HighCVE-2026-22813: CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') in anomalyco opencode
CriticalActions
Updates to AI analysis require Pro Console access. Upgrade inside Console → Billing.
Need more coverage?
Upgrade to Pro Console in Console -> Billing for AI refresh and higher limits.
For incident response and remediation, OffSeq services can help resolve threats faster.