字符编码:如何检查字符是单字节还是多字节 [英] Character Encoding: how to check whether the character is single byte or multiple byte
问题描述
嗨
我对单字节和多字节字符有疑问。我已经在某处看到如何检查字符是单字节,双字节,三字节但是没有得到它。
设b是我们需要检查的字符
对于单字节字符:b&0x80 == 0x00;
对于双字节字符:b&0xE0 == 0xC0;
对于三字节字符:b&0xF0 = = 0xE0;
任何人都可以解释这些背后的逻辑。
提前谢谢。
Hi
I have a doubt regarding single byte and multiple byte character. I have seen somewhere how to check whether the character is single byte, double byte , triple byte but didn't get it.
Let b is the character we need to check
For single byte character: b & 0x80 == 0x00;
For double byte character: b & 0xE0 == 0xC0;
For triple byte character: b & 0xF0 == 0xE0;
Can anyone please explain the logic behind these.
Thanks in advance.
推荐答案
请参阅维基百科上的 UTF-8编码 [ ^ ]。根据该表,(单个字节字符)清除了最高有效位(0
)。你可以用AND
用0x80
来测试这样一个条件(即10000000
二进制)。
同样,所有双字节字符都以110
标记开头,您可以通过b& 0xE0 == 0xC0
(即b& 11100000b == 11000000b
)。
等等。
See the UTF-8 encoding at Wikipedia[^]. According to the table, (the first byte of) a single byte character has the most significant bit cleared (0
). You may test such a condition byAND
ing such byte with0x80
(that is10000000
in binary).
Similarly, all two-byte characters starts with the110
marker and you can test it byb & 0xE0 == 0xC0
(that isb & 11100000b == 11000000b
).
And so on.
你可以做的是使用
What you can do is to use
int noOfBytes = sizeof(b)
然后你就会知道b需要多少字节。
您可在此处找到更多信息
http://en.wikipedia.org/wiki/Character_encoding [ ^ ]
这里
http://en.wikipedia.org/wiki/UTF-16 [ ^ ]
这篇关于字符编码:如何检查字符是单字节还是多字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!