Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style attribute triggers a RemovingAttribute Event with an empty value #504

Open
florianculie opened this issue Nov 20, 2023 · 8 comments
Open

Comments

@florianculie
Copy link

Hi,

We are using the HtmlSanitizer 8.0.746 release to filter user input. To be precise, we do not sanitize it, we forbid the user to write the data if HtmlSanitizer triggers a RemovingN event.

We are having problems regarding the following HTML string :

<p style="text-align: start;">this is the content of the p tag</p>

We have a custom RemovingAttributeEvent that we are adding like this :

private void InitSanitizerEvents()
{
    // We adapt all the following events to set the IsValid variable to false if one of them trigger.
    // We also log the error message according to what was detected.
    _sanitizer.RemovingTag += RemovingTagEvent;
    _sanitizer.RemovingAtRule += RemovingAtRuleEvent;
    _sanitizer.RemovingStyle += RemovingStyleEvent;
    _sanitizer.RemovingAttribute += RemovingAttributeEvent;
    _sanitizer.RemovingComment += RemovingCommentEvent;
    _sanitizer.RemovingCssClass += RemovingCssClassEvent;
}

The code of our custom event is the following :

private void RemovingAttributeEvent(object sender, RemovingAttributeEventArgs args)
{
    string attName = args.Attribute.Name;
    string attValue = args.Attribute.Value;

    // Check if this is an event with allowed values, and if the value is allowed.
    if (_allowedEvents.TryGetValue(attName, out ISet<string> allowedValues))
    {
        if (allowedValues.Contains(attValue))
            args.Cancel = true;
    }
    else
    {
        // Handle specific cases in a switch
        switch (attName)
        {
            // We need this case as the 'class' attribute trigger this event, but we don't want to treat it here
            case "class":
                args.Cancel = true;
                break;
            case "src":
                if (attValue.StartsWith("data:image/", StringComparison.Ordinal))
                    args.Cancel = true;
                else
                    OnUnauthorizedLinkError(attValue);

                break;
            case "href":
                if (attValue.StartsWith("mailto:", StringComparison.Ordinal))
                {
                    try
                    {
                        var providedEmail = attValue.Split(':')[1];
                        var email = new System.Net.Mail.MailAddress(providedEmail);

                        args.Cancel = email.Address.Equals(providedEmail, StringComparison.Ordinal);
                    }
                    catch (Exception ex) when (ex is ArgumentException || ex is FormatException)
                    {/* error in parsing value means that email is invalid, which leads to security error. */}
                }
                else
                    OnUnauthorizedLinkError(attValue);

                break;
        }
    }

    if (!args.Cancel)
    {
        _isRichTextValid = false;
        _errors.Add(new CssAttributeError(args.Reason.ToString(), args.Tag.TagName, attName));
    }
}

To put it simply, base on the logic I described at the begining, we cancel the RemovingAttribute Event if the reason for the trigger is considered safe for us, otherwise we confirm the trigger, and we add the reason to the _error object to log all reasons.

The problem we are facing is the fact that the event is being triggered with an "incomplete" html, as you can see below :
image
Which gives us an args.Attribute.Name = "style" but an empty value for the actual style.

During the investigation of the issue, I did notice we were using the 6.0 version of your package, and by updating to 8.0, I still reproduce it. On a side note, we have been using your package for 6 year now, and we do not reproduce this issue with the 4.0.183 version, the version being used in an old version of our software.

Do you happen to have any idea if this is a normal behaviour or a bug ?

@mganss
Copy link
Owner

mganss commented Nov 20, 2023

I can't reproduce the issue. Can you try and add a code snippet here that shows the issue?

This is what I tried:

var html = @"<p style=""text-align: start;"">this is the content of the p tag</p>";
var sanitizer = new HtmlSanitizer();
sanitizer.AllowedCssProperties.Remove("text-align");
sanitizer.AllowedAttributes.Remove("style");
sanitizer.RemovingAttribute += (s, e) =>
{
    Assert.Equal("text-align: start;", e.Tag.Attributes[0].Value);
};
var output = sanitizer.Sanitize(html);

@florianculie
Copy link
Author

Here's the Program.cs content from a new .NET 6 Console App :

using Ganss.Xss;

namespace ConsoleApp1
{
    internal class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello, World!");

            var html = @"<p style=""text-align: start;"">this is the content of the p tag</p>";
            var sanitizer = new HtmlSanitizer();

//If you comment the two following lines, you will reproduce
            sanitizer.AllowedCssProperties.Remove("text-align");
            sanitizer.AllowedAttributes.Remove("style");
//
            sanitizer.RemovingAttribute += (s, e) =>
            {
                string test = e.Tag.Attributes[0].Value;
            };
            var output = sanitizer.Sanitize(html);

        }
    }
}

If I use your code, with the two .Remove calls, I have no issue. However, if I comment those two lines (since I don't want to disallow them), you can see that the

e.Tag.Attributes[0].Value;

is an empty string, and the e.Tag.OuterHtml does not have the style value :
image

@mganss
Copy link
Owner

mganss commented Nov 20, 2023

OK thanks, I can repro now. This occurs because the parsed style is empty in AngleSharp's CSSOM. Strangely, it does not occur if the style's value is text-align: left for example. On MDN it says for the initial value of text-align

start, or a nameless value that acts as left if direction is ltr, right if direction is rtl if start is not supported by the browser.

Perhaps start is not supported by AngleSharp and that's why it's not reflected in the OM? I'll open an issue in the AngleSharp.Css repo to find out what's going on.

@mganss
Copy link
Owner

mganss commented Nov 20, 2023

I have reported as AngleSharp/AngleSharp.Css#151

@mganss
Copy link
Owner

mganss commented Nov 21, 2023

The start value for text-align is not yet supported by AngleSharp.Css.

@tiesont
Copy link

tiesont commented Nov 22, 2023

For what it's worth, version 4.0.1830 uses AngleSharp 0.9.9.1. AngleSharp split the CSS processing to a separate library in 0.10.0, so I assume any version of HtmlSanitizer from v5 onwards (which uses AngleSharp 0.13.0) probably behaves differently with respect to CSS parsing.

@florianculie
Copy link
Author

Sorry for the delay in response.
Thanks for your analysis.
As it seems it's an invalid value, I think we will remove the style attribute from the problematic data for the project.

If I am correct with my understanding, we need to wait for Anglesharp to fix the issue your declared earlier, then migrate to the latest pre-release of HtmlSanitizer that uses this AngleSharp release. Is that it ?
If so, do you happen to have a "know issues" with the newest version of AngleSharp ?

Thanks for your time on this subject.

@mganss
Copy link
Owner

mganss commented Nov 22, 2023

@tiesont You're right. This scenario used to work before AngleSharp 0.10, see AngleSharp/AngleSharp.Css#151 (comment)

@florianculie My hope is that the next release of AngleSharp.Css will be 1.0 which would mean you would no longer need a prerelease version of HtmlSanitizer. But in general, yes, you would need to wait for a new release of AngleSharp.Css.

I'm not aware of any issues with the newest versions of AngleSharp and AngleSharp.Css. This issue is the only regression I'm aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants