处理UTF-8字符串 [英] Handle UTF-8 string

查看:169
本文介绍了处理UTF-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知linux使用UTF-8编码. 这意味着我可以使用std::string来处理字符串了吗? 只是编码将是UTF-8.

as I know linux uses UTF-8 encoding. This means I can use std::string for handling string right? Just the encoding will be UTF-8.

现在在UTF-8上,我们知道有些字符是1字节,有些是2,3 ..字节. 我的问题是:如何使用C ++在Linux上处理UTF-8编码的字符串?

Now on UTF-8 we know some characters are 1 byte some 2,3.. bytes. My question is: how to you deal with UTF-8 encoded string on Linux using C++?

特别是:如何获得以字节(或字符数)为单位的字符串长度?您将如何遍历字符串?等

Particularly: how would you get length of string say in bytes (or number of characters)? How would you traverse the string? etc.

我要问的原因是,正如我在UTF-8上所说的那样,字符可能不止一个字节,对吗? 因此,显然myString[7]myString[8]-可能不会引用两个不同的字符. 还有一个事实是UTF-8字符串是十个字节,它的字符数没有说太多吗?

The reason I am asking is that as I said on UTF-8 characters may be more than one byte right? So obviously myString[7] and myString[8] - might not refer to two different characters. Also fact that UTF-8 string is ten bytes, doesn't say much about its number of characters right?

推荐答案

您不能使用std::string处理UTF-8. string,尽管其名称,仅是一个(多)字节的容器.它不是文本存储的类型(字节缓冲区显然可以存储任何对象,包括文本).它甚至不存储字符(char是字节,而不是字符).

You cannot handle UTF-8 with std::string. string, despite its name, is only a container for (multi-) bytes. It is not a type for text storage (beyond the fact that a byte buffer can obviously store any object, including text). It doesn’t even store characters (char is a byte, not a character).

如果您想真正地处理(而不是仅仅存储)Unicode字符,则需要冒险进入标准库之外.传统上,这是通过诸如 ICU "之类的库完成的.

You need to venture outside the standard library if you want to actually handle (rather than just store) Unicode characters. Traditionally, this is done by libraries such as ICU.

但是,尽管这是一个成熟的库,但其C ++接口很烂. Ogonek 中采用了现代方法.它的建立尚不完善,仍在进行中,但提供了一个更好的界面.

However, while this is a mature library, its C++ interface sucks. A modern approach is taken in Ogonek. It’s not as well established and still work in progress, but provides a much nicer interface.

这篇关于处理UTF-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆