C++将ASII转义的unicode字符串转换为utf8字符串 [英] C++ convert ASII escaped unicode string into utf8 string

查看:51
本文介绍了C++将ASII转义的unicode字符串转换为utf8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读取带有 unicode 转义的标准 ascii 样式字符串,并将其转换为包含 utf8 编码等效项的 std::string.因此,例如\u03a0"(具有 6 个字符的 std::string)应转换为具有两个字符的 std::string,分别为 0xce、0xa0,以原始二进制形式表示.

I need to read in a standard ascii style string with unicode escaping and convert it into a std::string containing the utf8 encoded equivalent. So for example "\u03a0" (a std::string with 6 characters) should be converted into the std::string with two characters, 0xce, 0xa0 respectively, in raw binary.

如果使用 icu 或 boost 有一个简单的答案,但我找不到答案,我会很高兴.

Would be most happy if there's a simple answer using icu or boost but I haven't been able to find one.

(这类似于 将 Unicode 字符串转换为转义的 ASCII 字符串,但请注意,我最终需要达到 UTF8 编码.如果我们可以使用 Unicode 作为中间步骤,那很好.)

(This is similar to Convert a Unicode string to an escaped ASCII string, but NB that I ultimately need to arrive at the UTF8 encoding. If we can use the Unicode as an intermediate step that's fine.)

推荐答案

(\u03a0 是 GREEK CAPITAL LETTER PI 的 Unicode 代码点,其 UTF-8 编码为 0xCE 0xA0)

(\u03a0 is the Unicode code point for GREEK CAPITAL LETTER PI whose UTF-8 encoding is 0xCE 0xA0)

您需要:

  1. 从字符串 "\u03a0" 中获取数字 0x03a0:删除反斜杠和 u 并将 03a0 作为十六进制解析为 wchar_t.重复直到你得到一个(宽)字符串.
  2. 将 0x3a0 转换为 UTF-8.C++11 有一个 codecvt_utf8 可能会有所帮助.
  1. Get the number 0x03a0 from the string "\u03a0": drop the backslash and the u and parse 03a0 as hex, into a wchar_t. Repeat until you get a (wide) string.
  2. Convert 0x3a0 into UTF-8. C++11 has a codecvt_utf8 that may help.

这篇关于C++将ASII转义的unicode字符串转换为utf8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆