HTML entities are replaced with the actual symbols in the compiled css #1219

mtskf · 2021-01-28T02:35:18Z

🐛 bug report

HTML entities are replaced with the actual symbols in the compiled css.
And thus, the symbols show up garbled.

For example: If I comple scss with a pseudo element with html entity like this:

Source SCSS:

@charset "UTF-8";
#test:before { content: "\2713"; }

Compiled css:

@charset "UTF-8";
#test:before {
  content: "✓";
}

The html entity '\2713' is replaced with the actual symbol '✓' in the complied css,
and then, it's shown like this "eâœ“" on browsers.

The text was updated successfully, but these errors were encountered:

nex3 · 2021-01-29T00:12:59Z

The UTF-8 character is semantically identical to the escape as long as the CSS file itself is being parsed as UTF-8, and as long as the @charset declaration is there all browsers should parse it as such. What browser version are you using? Can you provide a website that reproduces the mangled rendering?

lhtin · 2021-02-08T03:39:49Z

Hi, I ran into the same problem, here the information:

System Info

Operating System: macOS Big Sur
Node Version: v14.15.4
NPM Version: v6.14.10

Reproduce

Create two files:

test.js file:

const nodeSass = require('node-sass');
const result1 = nodeSass.renderSync({file: "./test.scss"});
console.log('node-sass:\n', result1.css.toString());

const sass = require('sass');
const result2 = sass.renderSync({file: './test.scss'});
console.log('sass:\n', result2.css.toString());

test.scss file：

.a-icon {
  content: "\E91E";
}

Then run node test.js, output like this:

The problem is the output of node-sass and sass is diffrentent, and I think the node-sass's output is right. Thank you for your time.

lhtin · 2021-02-08T06:46:21Z

When I view the output with the Hex Fiend tool, I found the different:

src: 5C45393145
after node-sass: 5C45393145
after sass: EEA49E

Sass converts the escape string (5C45393145) to the real value(EEA49E), but node-sass doesn't do that. I think the escape string is more suitable in CSS and the real value will cause Chrome to render icon font to garbled sometimes:

nex3 · 2021-02-08T22:18:55Z

As above, please provide an example of a browser version and web page where this is rendering incorrectly, because according to the CSS spec the two examples you provided have identical semantics.

lhtin · 2021-02-09T03:51:50Z

Here the little demo, view like the following screenshot. And I found that if I omit the @charset "UTF-8" in CSS file, then the real value one doesn't work, but the escape one works fine. So, you are right, but maybe the escape one has better compatibility for browsers to detect the CSS file's charset. My CSS file not be added the @charset "UTF-8" when using sass-loader to compile the SCSS file, so the problem occurred.

Screenshot:

nex3 · 2021-02-09T10:03:38Z

If you're deleting the @charset declaration, then you're changing the semantics of the CSS from what Sass generates and you shouldn't be surprised that it can render incorrectly. If sass-loader is removing it, that sounds like a bug; I suggest you file it with them.

To be clear, we don't choose to emit real Unicode characters capriciously. We do so because it's considerably more compact than generating escapes and (more importantly) because otherwise anyone writing class names, comments, or content strings in non-ASCII-friendly languages would find their compiled stylesheets hopelessly illegible if they were just a bunch of escape codes. If you would prefer escapes, you can always postprocess the CSS with a tool like this one.

lhtin · 2021-02-09T14:06:51Z

Thank you so much for your reply. Is it better to keep the original content than to always emit Unicode characters? That is to say regardless of people use Unicode characters or the escape, always reserve it in the compiled file. I think the unchanged content is more friendly. It not only can keep Uncode characters in non-ASCII-friendly languages but also can keep the escape for icon-font used.

nex3 · 2021-02-09T21:15:10Z

Whether a character was written as an escape sequence or as a literal character is a detail of its parsing. There's no efficient way to preserve that information through to the point where that character gets serialized to CSS again.

lhtin · 2021-02-10T06:34:31Z

Really? Maybe you can introduce a flag about escaped or not of tokens when parsing, then you can use it to determine how to serialize. Another way maybe just to treat escape characters as ASCII characters and let the browser do escape parse. I found node-sass's behavior like that:

scss file:

.a-icon {
  content: "\E91E";
}
.a-iconb {
  content: "你好"
}

after compiling with node-sass:

@charset "UTF-8";
.a-icon {
  content: "\E91E"; }

.a-iconb {
  content: "你好"; }

after compiling with sass:

@charset "UTF-8";
.a-icon {
  content: "";
}

.a-iconb {
  content: "你好";
}

nykoleks · 2021-02-16T18:19:24Z

Hi there. I have the same problem
content:"\e935"; in .scss converting to content:"" in .css
This is very bad for working with font-icon.
It broke bootstrap-glyphicon, fontawesome, any custom created font-icon... terribly...
How I need to write to get: if I wrote "\e935" in .scss then I get "\e935" in .css?

\\ - this is dosent work - output: \\
I tryed:
--no-unicode
and
--no-charset
and
wrote @charset "ASCII"; (and @charset "UTF-8") at top of base file with above and without above, together and separated;

what I need to do to force compiling code correct???

MacBook
OS: Big Sure
sass: Dart-sass 1.32.7 installed with brew (brew install sass/sass/sass)

Before "global" update I used Ruby Sass and all works fine.

works next hack:


@function symbol-fix($symbol){
  $ret: '\'\\#{$symbol}\'';
  @return $ret;
}
.i-test:before {
  content: #{symbol-fix(e900)};
}

But it is very very very bad solution.
It works for one little project what was needed a very quick solution, but it is not applicable for large projects.
So if sass developers will not create fast and good solution for this issue, then (I don't like this solution but have no choice) I will must to convert all sass in to the .less. Please do not force me to do this... I didn't like .less... cr**...

nex3 · 2021-02-17T01:21:44Z

I'm closing this as a duplicate of #568, since it looks like there there isn't a case where a browser is actually rendering the UTF-8 character incorrectly. I'll still address some of the outstanding questions here, though.

@lhtin

Maybe you can introduce a flag about escaped or not of tokens when parsing, then you can use it to determine how to serialize.

What do we do about a stylesheet that uses some escapes and some literal Unicode characters? What if we have a situation where a user is (incorrectly) relying on their stylesheet to be emitted as plain-ASCII, but then a dependency uses the word "naïve" in a comment and breaks them?

Generally speaking, using heuristics like this just makes the behavior feel even more inconsistent and capricious than having a configurable option.

Another way maybe just to treat escape characters as ASCII characters and let the browser do escape parse.

This would violate Sass's fundamental design principle of being a CSS superset. According to CSS, the token "\2603" and the token "☃" are identical in meaning, but if we just didn't touch escape codes then "\2603" == "☃" would return false. It would similarly break all of Sass's string functions.

@nykoleks

As I explained above, the Unicode character has the exact same semantics in CSS as the escape code. Even though it looks different if you inspect the generated CSS in a text editor, it won't cause the browser to render it any different. (Unless you mess with the @charset declaration at the top, in which case—don't do that!)

If you want a workaround, I'll again recommend using a postprocessor like postcss-sass-unicode which will convert UTF-8 characters into escape codes.

lhtin · 2021-02-18T06:44:34Z

This would violate Sass's fundamental design principle of being a CSS superset. According to CSS, the token "\2603" and the token "☃" are identical in meaning, but if we just didn't touch escape codes then "\2603" == "☃" would return false. It would similarly break all of Sass's string functions.

Since the two tokens are identical, It should not conflict with the design principles that try to keep tokens which valid in CSS unchanged after compiling from Sass source, right? I think the unchanged encode of characters is very friendly and important for Sass's users.
For the break of Sass's string functions, can it be solved by modifying the implements of those functions?

lhtin · 2021-02-18T06:49:40Z

Hi there. I have the same problem
content:"\e935"; in .scss converting to content:"" in .css
This is very bad for working with font-icon.
It broke bootstrap-glyphicon, fontawesome, any custom created font-icon... terribly...
How I need to write to get: if I wrote "\e935" in .scss then I get "\e935" in .css?

\\ - this is dosent work - output: \\
I tryed:
--no-unicode
and
--no-charset
and
wrote @charset "ASCII"; (and @charset "UTF-8") at top of base file with above and without above, together and separated;

what I need to do to force compiling code correct???

MacBook
OS: Big Sure
sass: Dart-sass 1.32.7 installed with brew (brew install sass/sass/sass)

Before "global" update I used Ruby Sass and all works fine.

works next hack:
@function symbol-fix($symbol){
  $ret: '\'\\#{$symbol}\'';
  @return $ret;
}
.i-test:before {
  content: #{symbol-fix(e900)};
}
But it is very very very bad solution.
It works for one little project what was needed a very quick solution, but it is not applicable for large projects.
So if sass developers will not create fast and good solution for this issue, then (I don't like this solution but have no choice) I will must to convert all sass in to the .less. Please do not force me to do this... I didn't like .less... cr**...

@nykoleks Can you check that the output file(CSS file) has @charset "UTF-8"; string or not in begin? If not exist, you can add it and play again.

nex3 · 2021-02-18T20:46:02Z

Since the two tokens are identical, It should not conflict with the design principles that try to keep tokens which valid in CSS unchanged after compiling from Sass source, right? I think the unchanged encode of characters is very friendly and important for Sass's users.

For the break of Sass's string functions, can it be solved by modifying the implements of those functions?

Neither of these are technically feasible. Tracking the original state of each character in a string would require a considerable amount of memory and processing overhead for every string Sass manages, and trying to decode escapes on-the-fly in every string function would similarly add a massive overhead to those functions (including extremely unintuitive performance characteristics like str.length() being O(n)). Neither of these are a better solution than simply globally choosing the encoding of the output.

lhtin · 2021-02-19T02:56:48Z

According to your answer, I'm curious why node-sass can do that? Just like the above example:

scss file:

.a-icon {
  content: "\E91E";
}
.a-iconb {
  content: "你好"
}

after compiling with node-sass:

@charset "UTF-8";
.a-icon {
  content: "\E91E"; }

.a-iconb {
  content: "你好"; }

after compiling with sass:

@charset "UTF-8";
.a-icon {
  content: "";
}

.a-iconb {
  content: "你好";
}

nex3 · 2021-02-19T21:07:41Z

LibSass's string parsing is outdated and incorrect. It doesn't follow CSS semantics and won't behave correctly with string functions or equality. This is part of the reason that LibSass is deprecated.

RYJASM · 2021-02-28T22:00:25Z

I am currently experiencing this issue as well. Even though the unicode character is there, many of the fonts used in my editor don't have that symbol, so I cannot tell what it is any longer. With the code, I could.

This is an issue because I'd like to keep a tabulated list of what values are assigned to what class. Without being able to see that in the css, it becomes difficult to diagnose issues or edit code and refactor it within systems after the sass has been compiled.

I also have issues actually using the code, because of the limited character set in several of the systems' databases that I use and only have front end or mid end access to. They simply will not accept the outputted code any longer due to the odd characters now emitted by the compiler.

To me it's a major drawback for sass/scss to convert the characters to something different than what I intended and seems akin to changing color values like #fff to named colors or not respecting my chosen gender. I'd certainly hate to be given a different gender at birth and not get to go with what I choose.

It's the same with writing the representation of a value vs converting it.

This is too hands on for a compiler. The default action should be to leave it as is and only convert it when a special marker is in place.

So to any of you magical coders out there doing the right thing and fixing issues, the calls of humanity are upon you and we are waiting on our knees for your kindness.

cbush06 · 2021-03-01T02:43:29Z

I am having the same issue when trying to use Font Awesome unicodes in CSS content properties for :before pseudo elements with an Angular CLI project. I'm reverting to node-sass and hope that Dart Sass will fix this issue in the future.

Awjin · 2021-03-02T19:39:43Z

@RYJASM I understand you might be frustrated, but let's keep this discussion centered on code. It's inappropriate to equate Sass's string parsing (which correctly follows CSS semantics and avoids bad performance) to the ongoing trauma and prejudice of gender issues.

nex3 · 2021-03-17T21:52:13Z

This issue is getting a bit heated, so I'm going to lock it. Here's the final summary:

Per the CSS spec, Unicode escapes have the exact same behavior as literal Unicode characters as long as a @charset "UTF-8" declaration or UTF-8 BOM is present. What's more, all browsers implement this correctly as far as I'm aware.
If you're seeing different behavior, it's almost certainly because you (or another tool you're using) deleted the @charset declaration or BOM, and you're serving your CSS with a non-UTF-8 character set declared in the HTTP headers.
For performance and correctness reasons, Sass cannot track whether each non-ASCII character was written as a literal Unicode character or an escape.
The best way to handle this is Add ASCII output support #568, adding an --ascii-only flag to avoid emitting Unicode at all. In the meantime, you can use postcss-sass-unicode to convert your CSS after it's been generated.

nex3 added the needs info label Jan 29, 2021

nykoleks mentioned this issue Feb 16, 2021

用最新的框架，打包出来element的字体图标乱码了？ PanJiaChen/vue-element-admin#3526

Open

nex3 closed this as completed Feb 17, 2021

This comment has been minimized.

Sign in to view

sass locked as too heated and limited conversation to collaborators Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML entities are replaced with the actual symbols in the compiled css #1219

HTML entities are replaced with the actual symbols in the compiled css #1219

mtskf commented Jan 28, 2021

nex3 commented Jan 29, 2021

lhtin commented Feb 8, 2021 •

edited

lhtin commented Feb 8, 2021 •

edited

nex3 commented Feb 8, 2021

lhtin commented Feb 9, 2021 •

edited

nex3 commented Feb 9, 2021

lhtin commented Feb 9, 2021 •

edited

nex3 commented Feb 9, 2021

lhtin commented Feb 10, 2021 •

edited

nykoleks commented Feb 16, 2021 •

edited

nex3 commented Feb 17, 2021

lhtin commented Feb 18, 2021

lhtin commented Feb 18, 2021 •

edited

nex3 commented Feb 18, 2021

lhtin commented Feb 19, 2021

nex3 commented Feb 19, 2021

RYJASM commented Feb 28, 2021 •

edited

cbush06 commented Mar 1, 2021

Awjin commented Mar 2, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

nex3 commented Mar 17, 2021

HTML entities are replaced with the actual symbols in the compiled css #1219

HTML entities are replaced with the actual symbols in the compiled css #1219

Comments

mtskf commented Jan 28, 2021

🐛 bug report

nex3 commented Jan 29, 2021

lhtin commented Feb 8, 2021 • edited

System Info

Reproduce

lhtin commented Feb 8, 2021 • edited

nex3 commented Feb 8, 2021

lhtin commented Feb 9, 2021 • edited

nex3 commented Feb 9, 2021

lhtin commented Feb 9, 2021 • edited

nex3 commented Feb 9, 2021

lhtin commented Feb 10, 2021 • edited

nykoleks commented Feb 16, 2021 • edited

nex3 commented Feb 17, 2021

lhtin commented Feb 18, 2021

lhtin commented Feb 18, 2021 • edited

nex3 commented Feb 18, 2021

lhtin commented Feb 19, 2021

nex3 commented Feb 19, 2021

RYJASM commented Feb 28, 2021 • edited

cbush06 commented Mar 1, 2021

Awjin commented Mar 2, 2021

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

nex3 commented Mar 17, 2021

lhtin commented Feb 8, 2021 •

edited

lhtin commented Feb 8, 2021 •

edited

lhtin commented Feb 9, 2021 •

edited

lhtin commented Feb 9, 2021 •

edited

lhtin commented Feb 10, 2021 •

edited

nykoleks commented Feb 16, 2021 •

edited

lhtin commented Feb 18, 2021 •

edited

RYJASM commented Feb 28, 2021 •

edited