Duplicate Advisory This advisory has been withdrawn because it is a duplicate of GHSA-4qqq-9vqf-3h3f. This link is maintained to preserve external references. Original Description In scrapy/scrapy, an issue was identified where the Authorization header is not removed during redirects that only change the scheme (e.g., HTTPS to HTTP) but remain within the same domain. This behavior contravenes the Fetch standard, which mandates the removal of Authorization headers in cross-origin requests …
When using system proxy settings, which are scheme-specific (i.e. specific to http:// or https:// URLs), Scrapy was not accounting for scheme changes during redirects. For example, an HTTP request would use the proxy configured for HTTP and, when redirected to an HTTPS URL, the new HTTPS request would still use the proxy configured for HTTP instead of switching to the proxy configured for HTTPS. Same the other way around. If …
Since version 2.11.1, Scrapy drops the Authorization header when a request is redirected to a different domain. However, it keeps the header if the domain remains the same but the scheme (http/https) or the port change, all scenarios where the header should also be dropped. In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header
Scrapy was following redirects regardless of the URL protocol, so redirects were working for data://, file://, ftp://, s3://, and any other scheme defined in the DOWNLOAD_HANDLERS setting. However, HTTP redirects should only work between URLs that use the http:// or https:// schemes. A malicious actor, given write access to the start requests (e.g. ability to define start_urls) of a spider and read access to the spider output, could exploit this …
The scrapy/scrapy project is vulnerable to XML External Entity (XXE) attacks due to the use of lxml.etree.fromstring for parsing untrusted XML data without proper validation. This vulnerability allows attackers to perform denial of service attacks, access local files, generate network connections, or circumvent firewalls by submitting specially crafted XML data.
Duplicate Advisory This advisory has been withdrawn because it is a duplicate of GHSA-7j7m-v7m3-jqm7. This link is maintained to preserve external references. Original Description The scrapy/scrapy project is vulnerable to XML External Entity (XXE) attacks due to the use of lxml.etree.fromstring for parsing untrusted XML data without proper validation. This vulnerability allows attackers to perform denial of service attacks, access local files, generate network connections, or circumvent firewalls by submitting …
Duplicate Advisory This advisory has been withdrawn because it is a duplicate of GHSA-cw9j-q3vf-hrrv. This link is maintained to preserve external references. Original Description In scrapy versions before 2.11.1, an issue was identified where the Authorization header, containing credentials for server authentication, is leaked to a third-party site during a cross-domain redirect. This vulnerability arises from the failure to remove the Authorization header when redirecting across domains. The exposure of …
Duplicate Advisory This advisory has been withdrawn because it is a duplicate of GHSA-cw9j-q3vf-hrrv. This link is maintained to preserve external references. Original Description In scrapy versions before 2.11.1, an issue was identified where the Authorization header, containing credentials for server authentication, is leaked to a third-party site during a cross-domain redirect. This vulnerability arises from the failure to remove the Authorization header when redirecting across domains. The exposure of …
Impact Scrapy limits allowed response sizes by default through the DOWNLOAD_MAXSIZE and DOWNLOAD_WARNSIZE settings. However, those limits were only being enforced during the download of the raw, usually-compressed response bodies, and not during decompression, making Scrapy vulnerable to decompression bombs. A malicious website being scraped could send a small response that, on decompression, could exhaust the memory available to the Scrapy process, potentially affecting any other process sharing that memory, …
Impact When you send a request with the Authorization header to one domain, and the response asks to redirect to a different domain, Scrapy’s built-in redirect middleware creates a follow-up redirect request that keeps the original Authorization header, leaking its content to that second domain. The right behavior would be to drop the Authorization header instead, in this scenario. Patches Upgrade to Scrapy 2.11.1. If you are using Scrapy 1.8 …
Impact The following parts of the Scrapy API were found to be vulnerable to a ReDoS attack: The XMLFeedSpider class or any subclass that uses the default node iterator: iternodes, as well as direct uses of the scrapy.utils.iterators.xmliter function. Scrapy 2.6.0 to 2.11.0: The open_in_browser function for a response without a base tag. Handling a malicious response could cause extreme CPU and memory usage during the parsing of its content, …