GHSA-23j4-mw76-5v7h: Scrapy allows redirect following in protocols other than HTTP
Scrapy was following redirects regardless of the URL protocol, so redirects were working for data://, file://, ftp://, s3://, and any other scheme defined in the DOWNLOAD_HANDLERS setting.
However, HTTP redirects should only work between URLs that use the http:// or https:// schemes.
A malicious actor, given write access to the start requests (e.g. ability to define start_urls) of a spider and read access to the spider output, could exploit this vulnerability to:
- Redirect to any local file using the
file://scheme to read its contents. - Redirect to an
ftp://URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project. - Redirect to any
s3://URL to read its content using the S3 credentials configured in the spider or project.
For file:// and s3://, how the spider implements its parsing of input data into an output item determines what data would be vulnerable. A spider that always outputs the entire contents of a response would be completely vulnerable, while a spider that extracted only fragments from the response could significantly limit vulnerable data.
References
Code Behaviors & Features
Detect and mitigate GHSA-23j4-mw76-5v7h with GitLab Dependency Scanning
Secure your software supply chain by verifying that all open source dependencies used in your projects contain no disclosed vulnerabilities. Learn more about Dependency Scanning →