Skip to content

Arbitrary HTML present after sanitization because of unicode normalization

High severity GitHub Reviewed Published May 5, 2024 in matthiask/html-sanitizer • Updated May 6, 2024

Package

pip html-sanitizer (pip)

Affected versions

< 2.4.2

Patched versions

2.4.2

Description

Impact

If using keep_typographic_whitespace=False (which is the default), the sanitizer normalizes unicode to the NFKC form at the end. Some unicode characters normalize to chevrons; this allows specially crafted HTML to escape sanitization.

Patches

The problem has been fixed in 2.4.2.

Workarounds

Set keep_typographic_whitespace=True explicitly, or normalize to NFKC yourself earlier.

References

@matthiask matthiask published to matthiask/html-sanitizer May 5, 2024
Published to the GitHub Advisory Database May 6, 2024
Reviewed May 6, 2024
Last updated May 6, 2024

Severity

High

Weaknesses

No CWEs

CVE ID

CVE-2024-34078

GHSA ID

GHSA-wvhx-q427-fgh3

Credits

Checking history
See something to contribute? Suggest improvements for this vulnerability.