什么是Unicode U + 001A字符?亦称0x1A [英] What is the Unicode U+001A Character? Aka 0x1A

查看:407
本文介绍了什么是Unicode U + 001A字符?亦称0x1A的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

U + 001A字符经常出现在与字符编码有关的错误消息中. U + 001A字符是什么?

The U+001A character appears frequently in error messages relating to character encoding. What is the U+001A character?

推荐答案

U + 001A在Unicode标准中定义为名称为SUBSTITUTE的控制字符,并且属于具有以下特征的组,在第16章:在Unicode标准中,有65个代码点为与C0的兼容性 在ISO/IEC 2022框架中定义的C1和C1控制代码.Unicode标准规定了这些代码点的完整交换, 增加或减少其语义.控制代码的语义通常由使用它们的应用程序确定.但是,在没有 对于特定的应用程序,可以根据ISO/IEC 6429:1992中指定的控制功能语义来解释它们."

U+001A is defined in the Unicode Standard as a control character with the name SUBSTITUTE, and it belongs to a group characterized as follows, in chapter 16 of the standard: "There are 65 code points set aside in the Unicode Standard for compatibility with the C0 and C1 control codes defined in the ISO/IEC 2022 framework [...] The Unicode Standard provides for the intact interchange of these code points, neither adding to nor subtracting from their semantics. The semantics of the control codes are generally determined by the application with which they are used. However, in the absence of specific application uses, they may be interpreted according to the control function semantics specified in ISO/IEC 6429:1992."

ISO 6429实际上等效于 ECMA 48 提到此代码时也使用了短名称SUB,并将其定义如下:"SUB用于代替已发现无效或错误的字符. SUB旨在 通过自动方式引入."这反映了在Ascii中对该控制代码的定义.

ISO 6429 is effectively equivalent to ECMA 48, which mentions this code as having the short name SUB, too, and defines it as follows: "SUB is used in the place of a character that has been found to be invalid or in error. SUB is intended to be introduced by automatic means." This reflects the definition of this control code in Ascii.

因此,通常,U + 001A可用于指示字符级数据错误,例如在所称字符数据中存在字节,这些错误在所应用的字符编码中没有解释.宽松地说,这将意味着坏字符数据",而更恰当的意思是当试图将数据解释为字符时格式错误的数据".但是,在Unicode中,U + FFFD REPLACEMENT CHARACTER更合适,因为它具有特定的Unicode语义.

Thus, in general, U+001A may be used to indicate a character-level data error, such as the presence of bytes, in purported character data, that have no interpretation in the character encoding being applied. Loosely speaking, it would thus mean "bad character data", but more appropriately "malformed data, when trying to interpret data as characters". However, in Unicode, U+FFFD REPLACEMENT CHARACTER is more appropriate, as it has specific Unicode semantics.

由于问题已被标记为"xml",因此需要注意的是,在XML 1.0中,子句

Since the question has been tagged with "xml", it needs to be noted that in XML 1.0, U+001A is forbidden, by clause 2.2 Characters. Note that the comment "any Unicode character, excluding the surrogate blocks, FFFE, and FFFF" is misleading (but comments are non-normative); U+001A is a Unicode character, though it is not a graphic character and its effect is not defined in the Unicode Standard.

这篇关于什么是Unicode U + 001A字符?亦称0x1A的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆