CVE-2025-3044: LlamaIndex vulnerability in ArxivReader class can cause MD5 hash collisions

July 7, 2025 (updated July 8, 2025)

A vulnerability in the ArxivReader class of the run-llama/llama_index repository allows for MD5 hash collisions when generating filenames for downloaded papers. This can lead to data loss as papers with identical titles but different contents may overwrite each other, preventing some papers from being processed for AI model training. The issue is resolved in llama-index-readers-papers version 0.3.1 (in llama-index 0.12.28).

References

github.com/advisories/GHSA-p7j4-jwjf-5x9w
github.com/run-llama/llama_index
github.com/run-llama/llama_index/commit/0008041e8dde8e519621388e5d6f558bde6ef42e
github.com/run-llama/llama_index/commit/f69e1c0e7579228fec4cfaf716e4f951e131de77
huntr.com/bounties/80182c3a-876f-422f-8bac-38267e0345d6
nvd.nist.gov/vuln/detail/CVE-2025-3044

Code Behaviors & Features

Detect and mitigate CVE-2025-3044 with GitLab Dependency Scanning

Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →