如何知道UTF8字符串中的字符数 [英] How to know the number of characters in utf8 string
问题描述
我想知道的是有一个简单的方法来确定字符的数UTF8
字符串。
例如,在窗口可以通过完成:
i want to know is there a simple way to determine the number of characters in UTF8
string.
For example, in windows it can be done by:
-
UTF8
字符串转换为wchar_t的
字符串 - 使用
wcslen
函数,并得到结果
- converting
UTF8
string towchar_t
string - use
wcslen
function and get result
但我需要更简单,跨平台的解决方案。
But I need more simpler and crossplatform solution.
先谢谢了。
推荐答案
UTF-8字符为单字节,其中最左边的位是 0
或多个字节,其中第一个字节的最左位 1..10 ...
(以1s的左侧2个以上的),随后的连续字节表格 10 ...
(即一个1左)。假设你的字符串格式良好的可以遍历所有字节,并增加你的字符数每次你看到一个字节是形式而不是时间 10 ...
- 即所有UTF-8字符计数只有第一个字节
UTF-8 characters are either single bytes where the left-most-bit is a 0
or multiple bytes where the first byte has left-most-bit 1..10...
(with the number of 1s on the left 2 or more) followed by successive bytes of the form 10...
(i.e. a single 1 on the left). Assuming that your string is well-formed you can loop over all the bytes and increment your "character count" every time you see a byte that is not of the form 10...
- i.e. counting only the first bytes in all UTF-8 characters.
这篇关于如何知道UTF8字符串中的字符数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!