是否所有汉字字符在UTF-8 3个字节长? [英] Are all Kanji characters in UTF-8 3 bytes long?
问题描述
有人可以在UTF-8中确认汉字中的所有汉字字符都为3个字节长吗?
Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?
推荐答案
汉字/汉字字符位于U + 4E00和U + 9FFF之间的CJK Unified Ideographs块中,并且在UTF-8中占用3个字节。 (日语平假名和片假名字符也需要3个字节。)
The commonly used Hanzi/Kanji characters are in the "CJK Unified Ideographs" block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)
但是,CJK Unified Ideographs Extension B
However, there are also some very rarely-used characters in the "CJK Unified Ideographs Extension B" and "CJK Compatibility Ideographs Supplement" blocks, which take 4 bytes in UTF-8.
另请注意,中文文本通常包含ASCII字符,如数字0-9。
Also be aware that Chinese text often contains ASCII characters like the digits 0-9.
这篇关于是否所有汉字字符在UTF-8 3个字节长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!