Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url extra escaping #533

Open
andrewQwer opened this issue Feb 23, 2024 · 3 comments
Open

Url extra escaping #533

andrewQwer opened this issue Feb 23, 2024 · 3 comments

Comments

@andrewQwer
Copy link

andrewQwer commented Feb 23, 2024

Hi, I'm using HtmlSanitizer for markup sanitizing and after library update from 5.x to 8.x & sign in URLs got escaped.
The problem is that I can't catch where it happens.

I have the following code:

var gj = new HtmlSanitizer
 {
     OutputFormatter = HtmlMarkupFormatter.Instance,
     AllowDataAttributes = true
 };
gj.Sanitize("<img src='http://foobar.com?x=5&y-6'>")

Outputs is: <img src="http://foobar.com?x=5&amp;y-6"> - &amp; appeared.

I tried to do the following:

gj.FilterUrl += (object o, FilterUrlEventArgs e) => {
 Console.WriteLine(e.OriginalUrl); //shows <img src='http://foobar.com?x=5&y-6'>
 Console.WriteLine(e.SanitizedUrl); // shows <img src='http://foobar.com?x=5&y-6'>
}

So in this event both variables are the same, so no chance of fixing it at this stage.

Ok, I tried the following:

gj.PostProcessDom += (sender, args) =>
. {
.     var doc = args.Document;
.     var imgNodes= doc.QuerySelectorAll("img");
.     foreach (var imgNode in imgNodes)
.     {
.         Console.WriteLine("SRC in DOC:" + imgNode.GetAttribute("src")); //shows SRC in DOC: http://foobar.com?x=5&y-6
.     }
. };

So even post process event doesn't have this node escaped. Same is actual for PostProcessNode event.

What can I do else to get back URLs in src/href attributes to it's original unescaped value?

@tiesont
Copy link

tiesont commented Feb 23, 2024

Possibly relevant, although not a fix: #401

@andrewQwer
Copy link
Author

andrewQwer commented Feb 24, 2024

Possibly relevant, although not a fix: #401

Yes, indeed. I'm ok to escape it, but I would like to have a chance to fix it somehow, at least in events. 'FilterUrl' event seems the most logical place, but at the moment event fires url is still unescaped.

Also found this issue:

AngleSharp/AngleSharp#348

@andrewQwer
Copy link
Author

For now I wrote the following fix using custom OutputFormatter: https://dotnetfiddle.net/wbtvUI, but let me know if there is another way to catch escaped values in events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants