如何计数在C UNI code字符串中的字符 [英] How to count characters in a unicode string in C

查看:118
本文介绍了如何计数在C UNI code字符串中的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以说我有一个字符串:

Lets say I have a string:

char theString[] = "你们好āa";

鉴于我的编码是UTF-8,该字符串是12个字节长(这三个汉字字符是每三个字节,与长音符号的拉丁字符是两个字节,并且a为一字节:

strlen(theString) == 12

我怎么能算的字符数?我该怎么办下标相当于这样:

How can I count the number of characters? How can i do the equivalent of subscripting so that:

theString[3] == "好"

我怎么可以切片,和猫这样的字符串?

How can I slice, and cat such strings?

推荐答案

您只算具有最高两位没有被设置为 10 (即,一切人物小于 0x80的或大于为0xBF )。

You only count the characters that have the top two bits are not set to 10 (i.e., everything less that 0x80 or greater than 0xbf).

这是因为所有设置为最高两位字符 10 是UTF-8字节延续

That's because all the characters with the top two bits set to 10 are UTF-8 continuation bytes.

请参阅here该编码的描述以及如何的strlen 可以在UTF-8字符串工作。

See here for a description of the encoding and how strlen can work on a UTF-8 string.

有关切片和切块UTF-8字符串,则基本都遵循相同的规则。任意以一个 0 位或 11 序字节是UTF-8 code点的开始,其他均为连续字符。

For slicing and dicing UTF-8 strings, you basically have to follow the same rules. Any byte starting with a 0 bit or a 11 sequence is the start of a UTF-8 code point, all others are continuation characters.

您最好的选择,如果你不希望使用第三方库,是简单地提供功能线沿线的:

Your best bet, if you don't want to use a third-party library, is to simply provide functions along the lines of:

utf8left (char *destbuff, char *srcbuff, size_t sz);
utf8mid  (char *destbuff, char *srcbuff, size_t pos, size_t sz);
utf8rest (char *destbuff, char *srcbuff, size_t pos;

获得,分别为:


  • SZ UTF-8字节的字符串。

  • SZ UTF-8字节的字符串的开始, POS

  • 字符串的UTF-8字节的休息,开始在 POS

  • the left sz UTF-8 bytes of a string.
  • the sz UTF-8 bytes of a string, starting at pos.
  • the rest of the UTF-8 bytes of a string, starting at pos.

这将是一个体面的构建模块,能够充分操纵字符串为您的目的。

This will be a decent building block to be able to manipulate the strings sufficiently for your purposes.

这篇关于如何计数在C UNI code字符串中的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆