C++将ASII转义的unicode字符串转换为utf8字符串 [英] C++ convert ASII escaped unicode string into utf8 string
问题描述
我需要读取带有 unicode 转义的标准 ascii 样式字符串,并将其转换为包含 utf8 编码等效项的 std::string.因此,例如\u03a0"(具有 6 个字符的 std::string)应转换为具有两个字符的 std::string,分别为 0xce、0xa0,以原始二进制形式表示.
I need to read in a standard ascii style string with unicode escaping and convert it into a std::string containing the utf8 encoded equivalent. So for example "\u03a0" (a std::string with 6 characters) should be converted into the std::string with two characters, 0xce, 0xa0 respectively, in raw binary.
如果使用 icu 或 boost 有一个简单的答案,但我找不到答案,我会很高兴.
Would be most happy if there's a simple answer using icu or boost but I haven't been able to find one.
(这类似于 将 Unicode 字符串转换为转义的 ASCII 字符串,但请注意,我最终需要达到 UTF8 编码.如果我们可以使用 Unicode 作为中间步骤,那很好.)
(This is similar to Convert a Unicode string to an escaped ASCII string, but NB that I ultimately need to arrive at the UTF8 encoding. If we can use the Unicode as an intermediate step that's fine.)
推荐答案
(\u03a0 是 GREEK CAPITAL LETTER PI 的 Unicode 代码点,其 UTF-8 编码为 0xCE 0xA0)
(\u03a0 is the Unicode code point for GREEK CAPITAL LETTER PI whose UTF-8 encoding is 0xCE 0xA0)
您需要:
- 从字符串 "\u03a0" 中获取数字 0x03a0:删除反斜杠和 u 并将 03a0 作为十六进制解析为 wchar_t.重复直到你得到一个(宽)字符串.
- 将 0x3a0 转换为 UTF-8.C++11 有一个 codecvt_utf8 可能会有所帮助.
- Get the number 0x03a0 from the string "\u03a0": drop the backslash and the u and parse 03a0 as hex, into a wchar_t. Repeat until you get a (wide) string.
- Convert 0x3a0 into UTF-8. C++11 has a codecvt_utf8 that may help.
这篇关于C++将ASII转义的unicode字符串转换为utf8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!