wstring字符是Unicode吗?转换期间会发生什么? [英] Is wstring character is Unicode ? What happens during conversion?

查看:740
本文介绍了wstring字符是Unicode吗?转换期间会发生什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近一次,我遇到了UTF-8编码到字符串反之亦然的转换.我知道UTF-8编码用于保存世界上几乎所有字符,而使用内置于字符串数据类型的char只能存储ASCII值.对于UTF-8编码的字符,所需字节数内存中的字节从1个字节到4个字节不等,但对于字符"类型,通常为1个字节.

Recent times I am coming across the conversion of UTF-8 encoding to string and vice vera. I understood that UTF-8 encoding is used to hold almost all the characters in the world while using char which is built in data type for string, only ASCII values can be stored.For a character in UTF-8 encoding the number of bytes required in memory is varied from one byte to 4 bytes but for 'char' type it is usually 1 byte.

我的问题是从wstring转换为string或从wchar转换为char会发生什么? 是否会跳过需要多个字节的字符?看来这取决于实现方式,但是我想知道正确的方法是什么.

My question is what happens in conversion from wstring to string or wchar to char ? Does the characters which require more than one byte is skipped? It seems it depends on implementation but I want to know what is the correct way of doing it.

还需要wchar来存储unicode字符吗?据我了解,UNICODE字符也可以存储在普通字符串中.为什么要使用wstring或wchar?

Also does wchar is required to store unicode characters ? As far as I understood UNICODE characters can be stored in normal string as well. Why should we use wstring or wchar ?

推荐答案

取决于如何转换它们.
您需要指定源编码类型和目标编码类型.
wstring不是一种格式,它只是定义一种数据类型.

Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring is not a format, it just defines a data type.

现在通常当人们说"Unicode"时,其含义是UTF16,这是 Microsoft Windows 所使用的,而通常是wstring所包含的内容.

Now usually when one says "Unicode", one means UTF16 which is what Microsoft Windows uses, and that is usuasly what wstring contains.

因此,从UTF8转换为UTF16的正确方法:

So, the right way to convert from UTF8 to UTF16:

     std::string utf8String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::wstring utf16String = convert.from_bytes( utf8String );

反之亦然:

     std::wstring utf16String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::string utf16String = convert.to_bytes( utf16String );

并增加混乱:
windows 平台上使用std::string时(例如,使用多字节编译时),它不是 UTF8 .他们使用 ANSI .
更具体地说,是Windows使用的默认编码语言.

And to add to the confusion:
When you use std::string on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.

使用Unicode编译时,Windows API命令需要以下格式:

When compiling in Unicode the windows API commands expect these formats:

命令 A -多字节-ANSI
命令 W - Unicode -UTF16

CommandA - multibyte - ANSI
CommandW - Unicode - UTF16

这篇关于wstring字符是Unicode吗?转换期间会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆