Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML entities are replaced with the actual symbols in the compiled css #1219

Closed
mtskf opened this issue Jan 28, 2021 · 26 comments
Closed

HTML entities are replaced with the actual symbols in the compiled css #1219

mtskf opened this issue Jan 28, 2021 · 26 comments

Comments

@mtskf
Copy link

mtskf commented Jan 28, 2021

🐛 bug report

HTML entities are replaced with the actual symbols in the compiled css.
And thus, the symbols show up garbled.

For example: If I comple scss with a pseudo element with html entity like this:

Source SCSS:

@charset "UTF-8";
#test:before { content: "\2713"; }

Compiled css:

@charset "UTF-8";
#test:before {
  content: "✓";
}

The html entity '\2713' is replaced with the actual symbol '✓' in the complied css,
and then, it's shown like this "e✓" on browsers.

@nex3
Copy link
Contributor

nex3 commented Jan 29, 2021

The UTF-8 character is semantically identical to the escape as long as the CSS file itself is being parsed as UTF-8, and as long as the @charset declaration is there all browsers should parse it as such. What browser version are you using? Can you provide a website that reproduces the mangled rendering?

@lhtin
Copy link

lhtin commented Feb 8, 2021

Hi, I ran into the same problem, here the information:

System Info

  • Operating System: macOS Big Sur
  • Node Version: v14.15.4
  • NPM Version: v6.14.10

Reproduce

Create two files:

  • test.js file:
const nodeSass = require('node-sass');
const result1 = nodeSass.renderSync({file: "./test.scss"});
console.log('node-sass:\n', result1.css.toString());

const sass = require('sass');
const result2 = sass.renderSync({file: './test.scss'});
console.log('sass:\n', result2.css.toString());
  • test.scss file:
.a-icon {
  content: "\E91E";
}

Then run node test.js, output like this:

image

The problem is the output of node-sass and sass is diffrentent, and I think the node-sass's output is right. Thank you for your time.

@lhtin
Copy link

lhtin commented Feb 8, 2021

When I view the output with the Hex Fiend tool, I found the different:

src: 5C45393145
after node-sass: 5C45393145
after sass: EEA49E

Sass converts the escape string (5C45393145) to the real value(EEA49E), but node-sass doesn't do that. I think the escape string is more suitable in CSS and the real value will cause Chrome to render icon font to garbled sometimes:

image

@nex3
Copy link
Contributor

nex3 commented Feb 8, 2021

As above, please provide an example of a browser version and web page where this is rendering incorrectly, because according to the CSS spec the two examples you provided have identical semantics.

@lhtin
Copy link

lhtin commented Feb 9, 2021

Here the little demo, view like the following screenshot. And I found that if I omit the @charset "UTF-8" in CSS file, then the real value one doesn't work, but the escape one works fine. So, you are right, but maybe the escape one has better compatibility for browsers to detect the CSS file's charset. My CSS file not be added the @charset "UTF-8" when using sass-loader to compile the SCSS file, so the problem occurred.

Screenshot:
image

@nex3
Copy link
Contributor

nex3 commented Feb 9, 2021

If you're deleting the @charset declaration, then you're changing the semantics of the CSS from what Sass generates and you shouldn't be surprised that it can render incorrectly. If sass-loader is removing it, that sounds like a bug; I suggest you file it with them.

To be clear, we don't choose to emit real Unicode characters capriciously. We do so because it's considerably more compact than generating escapes and (more importantly) because otherwise anyone writing class names, comments, or content strings in non-ASCII-friendly languages would find their compiled stylesheets hopelessly illegible if they were just a bunch of escape codes. If you would prefer escapes, you can always postprocess the CSS with a tool like this one.

@lhtin
Copy link

lhtin commented Feb 9, 2021

Thank you so much for your reply. Is it better to keep the original content than to always emit Unicode characters? That is to say regardless of people use Unicode characters or the escape, always reserve it in the compiled file. I think the unchanged content is more friendly. It not only can keep Uncode characters in non-ASCII-friendly languages but also can keep the escape for icon-font used.

@nex3
Copy link
Contributor

nex3 commented Feb 9, 2021

Whether a character was written as an escape sequence or as a literal character is a detail of its parsing. There's no efficient way to preserve that information through to the point where that character gets serialized to CSS again.

@lhtin
Copy link

lhtin commented Feb 10, 2021

Really? Maybe you can introduce a flag about escaped or not of tokens when parsing, then you can use it to determine how to serialize. Another way maybe just to treat escape characters as ASCII characters and let the browser do escape parse. I found node-sass's behavior like that:

scss file:

.a-icon {
  content: "\E91E";
}
.a-iconb {
  content: "你好"
}

after compiling with node-sass:

@charset "UTF-8";
.a-icon {
  content: "\E91E"; }

.a-iconb {
  content: "你好"; }

after compiling with sass:

@charset "UTF-8";
.a-icon {
  content: "";
}

.a-iconb {
  content: "你好";
}

@nykoleks
Copy link

nykoleks commented Feb 16, 2021

Hi there. I have the same problem
content:"\e935"; in .scss converting to content:"" in .css
This is very bad for working with font-icon.
It broke bootstrap-glyphicon, fontawesome, any custom created font-icon... terribly...
How I need to write to get: if I wrote "\e935" in .scss then I get "\e935" in .css?

\\ - this is dosent work - output: \\
I tryed:
--no-unicode
and
--no-charset
and
wrote @charset "ASCII"; (and @charset "UTF-8") at top of base file with above and without above, together and separated;

what I need to do to force compiling code correct???

MacBook
OS: Big Sure
sass: Dart-sass 1.32.7 installed with brew (brew install sass/sass/sass)

Before "global" update I used Ruby Sass and all works fine.

works next hack:


@function symbol-fix($symbol){
  $ret: '\'\\#{$symbol}\'';
  @return $ret;
}
.i-test:before {
  content: #{symbol-fix(e900)};
}

But it is very very very bad solution.
It works for one little project what was needed a very quick solution, but it is not applicable for large projects.
So if sass developers will not create fast and good solution for this issue, then (I don't like this solution but have no choice) I will must to convert all sass in to the .less. Please do not force me to do this... I didn't like .less... cr**...

@nex3
Copy link
Contributor

nex3 commented Feb 17, 2021

I'm closing this as a duplicate of #568, since it looks like there there isn't a case where a browser is actually rendering the UTF-8 character incorrectly. I'll still address some of the outstanding questions here, though.

@lhtin

Maybe you can introduce a flag about escaped or not of tokens when parsing, then you can use it to determine how to serialize.

What do we do about a stylesheet that uses some escapes and some literal Unicode characters? What if we have a situation where a user is (incorrectly) relying on their stylesheet to be emitted as plain-ASCII, but then a dependency uses the word "naïve" in a comment and breaks them?

Generally speaking, using heuristics like this just makes the behavior feel even more inconsistent and capricious than having a configurable option.

Another way maybe just to treat escape characters as ASCII characters and let the browser do escape parse.

This would violate Sass's fundamental design principle of being a CSS superset. According to CSS, the token "\2603" and the token "☃" are identical in meaning, but if we just didn't touch escape codes then "\2603" == "☃" would return false. It would similarly break all of Sass's string functions.

@nykoleks

As I explained above, the Unicode character has the exact same semantics in CSS as the escape code. Even though it looks different if you inspect the generated CSS in a text editor, it won't cause the browser to render it any different. (Unless you mess with the @charset declaration at the top, in which case—don't do that!)

If you want a workaround, I'll again recommend using a postprocessor like postcss-sass-unicode which will convert UTF-8 characters into escape codes.

@nex3 nex3 closed this as completed Feb 17, 2021
@lhtin
Copy link

lhtin commented Feb 18, 2021

This would violate Sass's fundamental design principle of being a CSS superset. According to CSS, the token "\2603" and the token "☃" are identical in meaning, but if we just didn't touch escape codes then "\2603" == "☃" would return false. It would similarly break all of Sass's string functions.

  1. Since the two tokens are identical, It should not conflict with the design principles that try to keep tokens which valid in CSS unchanged after compiling from Sass source, right? I think the unchanged encode of characters is very friendly and important for Sass's users.
  2. For the break of Sass's string functions, can it be solved by modifying the implements of those functions?

@lhtin
Copy link

lhtin commented Feb 18, 2021

Hi there. I have the same problem
content:"\e935"; in .scss converting to content:"" in .css
This is very bad for working with font-icon.
It broke bootstrap-glyphicon, fontawesome, any custom created font-icon... terribly...
How I need to write to get: if I wrote "\e935" in .scss then I get "\e935" in .css?

\\ - this is dosent work - output: \\
I tryed:
--no-unicode
and
--no-charset
and
wrote @charset "ASCII"; (and @charset "UTF-8") at top of base file with above and without above, together and separated;

what I need to do to force compiling code correct???

MacBook
OS: Big Sure
sass: Dart-sass 1.32.7 installed with brew (brew install sass/sass/sass)

Before "global" update I used Ruby Sass and all works fine.

works next hack:


@function symbol-fix($symbol){
  $ret: '\'\\#{$symbol}\'';
  @return $ret;
}
.i-test:before {
  content: #{symbol-fix(e900)};
}

But it is very very very bad solution.
It works for one little project what was needed a very quick solution, but it is not applicable for large projects.
So if sass developers will not create fast and good solution for this issue, then (I don't like this solution but have no choice) I will must to convert all sass in to the .less. Please do not force me to do this... I didn't like .less... cr**...

@nykoleks Can you check that the output file(CSS file) has @charset "UTF-8"; string or not in begin? If not exist, you can add it and play again.

@nex3
Copy link
Contributor

nex3 commented Feb 18, 2021

  • Since the two tokens are identical, It should not conflict with the design principles that try to keep tokens which valid in CSS unchanged after compiling from Sass source, right? I think the unchanged encode of characters is very friendly and important for Sass's users.
  • For the break of Sass's string functions, can it be solved by modifying the implements of those functions?

Neither of these are technically feasible. Tracking the original state of each character in a string would require a considerable amount of memory and processing overhead for every string Sass manages, and trying to decode escapes on-the-fly in every string function would similarly add a massive overhead to those functions (including extremely unintuitive performance characteristics like str.length() being O(n)). Neither of these are a better solution than simply globally choosing the encoding of the output.

@lhtin
Copy link

lhtin commented Feb 19, 2021

According to your answer, I'm curious why node-sass can do that? Just like the above example:

scss file:

.a-icon {
  content: "\E91E";
}
.a-iconb {
  content: "你好"
}

after compiling with node-sass:

@charset "UTF-8";
.a-icon {
  content: "\E91E"; }

.a-iconb {
  content: "你好"; }

after compiling with sass:

@charset "UTF-8";
.a-icon {
  content: "";
}

.a-iconb {
  content: "你好";
}

@nex3
Copy link
Contributor

nex3 commented Feb 19, 2021

LibSass's string parsing is outdated and incorrect. It doesn't follow CSS semantics and won't behave correctly with string functions or equality. This is part of the reason that LibSass is deprecated.

@RYJASM
Copy link

RYJASM commented Feb 28, 2021

I am currently experiencing this issue as well. Even though the unicode character is there, many of the fonts used in my editor don't have that symbol, so I cannot tell what it is any longer. With the code, I could.

This is an issue because I'd like to keep a tabulated list of what values are assigned to what class. Without being able to see that in the css, it becomes difficult to diagnose issues or edit code and refactor it within systems after the sass has been compiled.

I also have issues actually using the code, because of the limited character set in several of the systems' databases that I use and only have front end or mid end access to. They simply will not accept the outputted code any longer due to the odd characters now emitted by the compiler.

To me it's a major drawback for sass/scss to convert the characters to something different than what I intended and seems akin to changing color values like #fff to named colors or not respecting my chosen gender. I'd certainly hate to be given a different gender at birth and not get to go with what I choose.

It's the same with writing the representation of a value vs converting it.

This is too hands on for a compiler. The default action should be to leave it as is and only convert it when a special marker is in place.

So to any of you magical coders out there doing the right thing and fixing issues, the calls of humanity are upon you and we are waiting on our knees for your kindness.

@cbush06
Copy link

cbush06 commented Mar 1, 2021

I am having the same issue when trying to use Font Awesome unicodes in CSS content properties for :before pseudo elements with an Angular CLI project. I'm reverting to node-sass and hope that Dart Sass will fix this issue in the future.

@Awjin
Copy link
Contributor

Awjin commented Mar 2, 2021

@RYJASM I understand you might be frustrated, but let's keep this discussion centered on code. It's inappropriate to equate Sass's string parsing (which correctly follows CSS semantics and avoids bad performance) to the ongoing trauma and prejudice of gender issues.

@RYJASM

This comment has been minimized.

@RYJASM

This comment has been minimized.

@cbush06

This comment has been minimized.

@RYJASM

This comment has been minimized.

@cbush06

This comment has been minimized.

@RYJASM

This comment has been minimized.

@nex3
Copy link
Contributor

nex3 commented Mar 17, 2021

This issue is getting a bit heated, so I'm going to lock it. Here's the final summary:

  • Per the CSS spec, Unicode escapes have the exact same behavior as literal Unicode characters as long as a @charset "UTF-8" declaration or UTF-8 BOM is present. What's more, all browsers implement this correctly as far as I'm aware.

  • If you're seeing different behavior, it's almost certainly because you (or another tool you're using) deleted the @charset declaration or BOM, and you're serving your CSS with a non-UTF-8 character set declared in the HTTP headers.

  • For performance and correctness reasons, Sass cannot track whether each non-ASCII character was written as a literal Unicode character or an escape.

  • The best way to handle this is Add ASCII output support #568, adding an --ascii-only flag to avoid emitting Unicode at all. In the meantime, you can use postcss-sass-unicode to convert your CSS after it's been generated.

@sass sass locked as too heated and limited conversation to collaborators Mar 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants