是否所有汉字字符在UTF-8 3个字节长? [英] Are all Kanji characters in UTF-8 3 bytes long?

查看:133
本文介绍了是否所有汉字字符在UTF-8 3个字节长?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以在UTF-8中确认汉字中的所有汉字字符都为3个字节长吗?

Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?

推荐答案

汉字/汉字字符位于U + 4E00和U + 9FFF之间的CJK Unified Ideographs块中,并且在UTF-8中占用3个字节。 (日语平假名和片假名字符也需要3个字节。)

The commonly used Hanzi/Kanji characters are in the "CJK Unified Ideographs" block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)

但是,CJK Unified Ideographs Extension B

However, there are also some very rarely-used characters in the "CJK Unified Ideographs Extension B" and "CJK Compatibility Ideographs Supplement" blocks, which take 4 bytes in UTF-8.

另请注意,中文文本通常包含ASCII字符,如数字0-9。

Also be aware that Chinese text often contains ASCII characters like the digits 0-9.

这篇关于是否所有汉字字符在UTF-8 3个字节长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆