致命错误:高代理和低代理代码点不是有效的 Unicode 标量值 [英] fatal error: high- and low-surrogate code points are not valid Unicode scalar values

查看:26
本文介绍了致命错误:高代理和低代理代码点不是有效的 Unicode 标量值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时在使用 57292 之类的值初始化 UnicodeScalar 时会产生以下错误:

Sometimes while initializing a UnicodeScalar with a value like 57292 yields the following error:

fatal error: high- and low-surrogate code points are not valid Unicode scalar values

这是什么错误,为什么会发生,我将来如何防止?

What is this error, why does it occur and how can I prevent it in the future?

推荐答案

背景:UTF-16 将 Unicode 字符序列(代码点")表示为 16 位代码单元"序列.对于标量值在 16 位以内的字符(即从 U+0000 到 U+FFFF 的字符),代码单元与字符具有相同的值;但是对于该范围之外的字符(从 U+10000 到 U+10FFFF 的字符),UTF-16 必须使用两个代码单元.为了实现这一点,Unicode 保留了一系列代码点(U+D800 到 U+DFFF)作为代理",不能用作字符;然后 UTF-16 可以将这些代理中的两个一起使用来表示 16 位范围之外的代码点.(高"和低"分别是指在这些对中充当第一第二代码单元的代理.每个代理是高代理或低代理,但不是两者兼而有之;使用较旧字符集的经验表明,始终能够分辨出一个字符的结束位置和下一个字符的开始位置非常有用.)

Background: UTF-16 represents a sequence of Unicode characters ("code points") as a sequence of 16-bit "code units". For characters whose scalar values fit within 16 bits (i.e., those from U+0000 to U+FFFF), the code unit has the same value as the character; but for characters outside that range (those from U+10000 to U+10FFFF), UTF-16 has to use two code units. To make this work, Unicode reserves a range of code-points (U+D800 to U+DFFF) as "surrogates", which cannot be used as characters; UTF-16 can then use two of these surrogates together to represent a code point outside the 16-bit range. (The "high" and "low" refer to surrogates that serve as the first and second code units in these pairs, respectively. Each surrogate is either a high surrogate or a low surrogate, but not both; experience with older character sets had shown that it's very useful to always be able to tell where one character ends and the next begins.)

因此,您看到的问题是您正在尝试创建一个 UnicodeScalar 值 (U+DFCC),根据 Unicode 标准,该值保留给 not 是一个 Unicode 标量.U+DFCC 被定义为不存在,只是对一半存在的标量的替代".

So the issue you're seeing is that you're trying to create a UnicodeScalar with a value (U+DFCC) that, according to the Unicode standard, is reserved to not be a Unicode scalar. U+DFCC is defined not to exist, and is just a "surrogate" for half of a scalar that does exist.

为了防止这个问题,你需要坚持使用确实存在的标量 —U+0000 到 U+D7FF 和 U+E000 到 U+10FFFF.

To prevent this issue, you need to stick to scalars that do exist — U+0000 to U+D7FF and U+E000 to U+10FFFF.

这篇关于致命错误:高代理和低代理代码点不是有效的 Unicode 标量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆