justhtml 1.16.0 fixes multiple security issues in sanitization, serialization, and programmatic DOM handling. Most of these issues affected one of these advanced paths rather than ordinary parsed HTML with the default safe settings: programmatic DOM input to sanitize() or sanitize_dom() reused or mutated sanitization policy objects custom policies that preserve foreign namespaces such as SVG or MathML
justhtml 1.15.0 includes multiple security fixes affecting URL sanitization helpers, HTML serialization, Markdown passthrough, and several custom sanitization-policy edge cases. These issues have different impact levels and do not all affect the default configuration in the same way.
A parser-differential / mutation XSS issue was found in justhtml when using a custom sanitization policy that preserves foreign namespaces such as SVG or MathML. Under these custom settings, specially crafted input could sanitize into HTML that looked safe at first, but became unsafe when parsed again by a browser or another HTML parser.
to_markdown() is vulnerable when serializing attacker-controlled <pre> content. The <pre> handler emits a fixed three-backtick fenced code block, but writes decoded text content into that fence without choosing a delimiter longer than any backtick run inside the content. An attacker can place backticks and HTML-like text inside a sanitized <pre> element so that the generated Markdown closes the fence early and leaves raw HTML outside the code block. When that …
to_markdown() does not sufficiently escape text content that looks like HTML. As a result, untrusted input that is safe in to_html() can become raw HTML in Markdown output. This is not specific to tokenizer raw-text states like <title>, <noscript>, or <plaintext>, although those states can trigger the behavior. The root cause is broader: Markdown text serialization leaves angle brackets unescaped in text nodes.
Sanitized DOM trees can be unsafe to serialize when a custom policy allows raw-text elements such as <style> or <script>. The issue affects DOM trees that are constructed or modified programmatically and then passed through sanitize_dom() with a policy that keeps these elements. Text nodes inside <style> and <script> are serialized literally, so attacker-controlled text containing the matching closing tag sequence can break out of the raw-text context and inject …
justhtml through 1.9.1 allows denial of service via deeply nested HTML. During parsing, JustHTML.init() always reaches TreeBuilder.finish(), which unconditionally calls _populate_selectedcontent(). That function recursively traverses the DOM via _find_elements() / _find_element() without a depth bound, allowing attacker-controlled deeply nested input to trigger an unhandled RecursionError on CPython. Depending on the host application's exception handling, this can abort parsing, fail requests, or terminate a worker/process.