使用std :: codecvt_xxx将C ++ std :: wstring转换为utf8 [英] Converting C++ std::wstring to utf8 with std::codecvt_xxx

查看:215
本文介绍了使用std :: codecvt_xxx将C ++ std :: wstring转换为utf8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++ 11具有将宽字符字符串std::wstring从utf8表示形式转换为utf8表示形式的工具:std::codecvtstd::codecvt_utf8std::codecvt_utf8_utf16等.

C++11 has tools to convert wide char strings std::wstring from/to utf8 representation: std::codecvt, std::codecvt_utf8, std::codecvt_utf8_utf16 etc.

Windows应用程序可以使用哪一个将常规宽字符Windows字符串std::wstring转换为utf8 std::string?是否始终在不配置语言环境的情况下都能正常工作?

Which one is usable by Windows app to convert regular wide char Windows strings std::wstring to utf8 std::string? Is it always works without configuring locales?

推荐答案

取决于如何转换它们.
您需要指定源编码类型和目标编码类型.
wstring不是一种格式,它只是定义了一种数据类型.

Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring is not a format, it just defines a data type.

现在通常当人们说"Unicode"时,其含义是UTF16,这是 Microsoft Windows 所使用的,而通常是wstring所包含的内容.

Now usually when one says "Unicode", one means UTF16 which is what Microsoft Windows uses, and that is usuasly what wstring contains.

因此,从UTF8转换为UTF16的正确方法:

So, the right way to convert from UTF8 to UTF16:

     std::string utf8String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::wstring utf16String = convert.from_bytes( utf8String );

反之亦然:

     std::wstring utf16String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::string utf8String = convert.to_bytes( utf16String );

并增加混乱:
windows 平台上使用std::string时(例如,使用多字节编译时),它不是 UTF8 .他们使用 ANSI .
更具体地说,是Windows使用的默认编码语言.

And to add to the confusion:
When you use std::string on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.

此外,请注意, wstring与UTF-16 .

使用Unicode编译时,Windows API命令需要以下格式:

When compiling in Unicode the windows API commands expect these formats:

命令 A -多字节-ANSI
命令 W - Unicode -UTF16

CommandA - multibyte - ANSI
CommandW - Unicode - UTF16

这篇关于使用std :: codecvt_xxx将C ++ std :: wstring转换为utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆