字符编码:如何检查字符是单字节还是多字节 [英] Character Encoding: how to check whether the character is single byte or multiple byte

查看:174
本文介绍了字符编码:如何检查字符是单字节还是多字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我对单字节和多字节字符有疑问。我已经在某处看到如何检查字符是单字节,双字节,三字节但是没有得到它。

设b是我们需要检查的字符

对于单字节字符:b&0x80 == 0x00;

对于双字节字符:b&0xE0 == 0xC0;

对于三字节字符:b&0xF0 = = 0xE0;



任何人都可以解释这些背后的逻辑。



提前谢谢。

Hi
I have a doubt regarding single byte and multiple byte character. I have seen somewhere how to check whether the character is single byte, double byte , triple byte but didn't get it.
Let b is the character we need to check
For single byte character: b & 0x80 == 0x00;
For double byte character: b & 0xE0 == 0xC0;
For triple byte character: b & 0xF0 == 0xE0;

Can anyone please explain the logic behind these.

Thanks in advance.

推荐答案

请参阅维基百科上的 UTF-8编码 [ ^ ]。根据该表,(单个字节字符)清除了最高有效位( 0 )。你可以用 AND 0x80 来测试这样一个条件(即 10000000 二进制)。

同样,所有双字节字符都以 110 标记开头,您可以通过 b& 0xE0 == 0xC0 (即 b& 11100000b == 11000000b )。

等等。
See the UTF-8 encoding at Wikipedia[^]. According to the table, (the first byte of) a single byte character has the most significant bit cleared (0). You may test such a condition by ANDing such byte with 0x80 (that is 10000000 in binary).
Similarly, all two-byte characters starts with the 110 marker and you can test it by b & 0xE0 == 0xC0 (that is b & 11100000b == 11000000b ).
And so on.


你可以做的是使用

What you can do is to use
int noOfBytes = sizeof(b)





然后你就会知道b需要多少字节。



您可在此处找到更多信息

http://en.wikipedia.org/wiki/Character_encoding [ ^ ]



这里

http://en.wikipedia.org/wiki/UTF-16 [ ^ ]


这篇关于字符编码:如何检查字符是单字节还是多字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆