Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tip] 姓名头像 #7

Open
canvascat opened this issue Jan 2, 2021 · 1 comment
Open

[tip] 姓名头像 #7

canvascat opened this issue Jan 2, 2021 · 1 comment

Comments

@canvascat
Copy link
Owner

截取姓名某一段字符,一般使用 substrsubstringslice 之类的方法。
但对于Unicode表情,它们的长度一般为2甚至更多,会导致截取到的部分不满足需求。
'😁😁不错哟'.substr(0, 3) 返回 "😁�"
使用 Array.from 将字符串转为数组。如 Array.from('😁😁不错哟') 返回 ["😁", "😁", "不", "错", "哟"],然后再使用 slice 取需要的位数,使用 join 拼接。

Array.from('😁😁不错哟').slice(0, 3).join(''); // "😁😁不"
@canvascat
Copy link
Owner Author

canvascat commented Jan 2, 2021

Unicode 表情与编码互转

"😁".codePointAt(0).toString(16); // "1f601"
String.fromCodePoint(0x1f601); // "😁"

CHARACTERS, CODEPOINTS, AND JAVASCRIPT STRINGS
JavaScript uses the UTF-16 encoding of the Unicode character set, and JavaScript strings are sequences of unsigned 16-bit values. The most commonly used Unicode characters (those from the “basic multilingual plane”) have codepoints that fit in 16 bits and can be represented by one element of a string. Unicode characters whose codepoints do not fit in 16 bits are encoded using the rules ofUTF-16 as a sequence (known as a “surrogate pair”) of two 16-bit values. This means that a JavaScript string of length 2 (two 16-bit values) might represent only a single Unicode character:

let euro = '€';
let love = '❤';
euro.length; // => 1: this character has one 16-bit element
love.length; // => 2: UTF-16 encoding of ❤ is "\ud83d\udc99"

Most string-manipulation methods defined by JavaScript operate on 16-bit values, not characters.
They do not treat surrogate pairs specially, they perform no normalization of the string, and don’t even ensure that a string is well-formed UTF-16.
In ES6, however, strings are iterable, and if you use the for/of loop or ... operator with a string, it will iterate the actual characters of the string, not the 16-bit values.

—— The Definitive Guide, Seventh Edition

Array.from('😁😁不错哟'); // => ["😁", "😁", "不", "错", "哟"]
[...'😁😁不错哟']; // => ["😁", "😁", "不", "错", "哟"]
for (const str of '😁😁不错哟') {
  console.log(str);
}
// 😁
// 😁
// 不
// 错
// 哟

由于js的字符串方法一般操作的是16位值,所以需要对含有Unicode字符的特殊字符串特别处理。

@canvascat canvascat reopened this May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant