wstring字符是Unicode吗?转换期间会发生什么? [英] Is wstring character is Unicode ? What happens during conversion?
问题描述
最近一次,我遇到了UTF-8编码到字符串反之亦然的转换.我知道UTF-8编码用于保存世界上几乎所有字符,而使用内置于字符串数据类型的char只能存储ASCII值.对于UTF-8编码的字符,所需字节数内存中的字节从1个字节到4个字节不等,但对于字符"类型,通常为1个字节.
Recent times I am coming across the conversion of UTF-8 encoding to string and vice vera. I understood that UTF-8 encoding is used to hold almost all the characters in the world while using char which is built in data type for string, only ASCII values can be stored.For a character in UTF-8 encoding the number of bytes required in memory is varied from one byte to 4 bytes but for 'char' type it is usually 1 byte.
我的问题是从wstring转换为string或从wchar转换为char会发生什么? 是否会跳过需要多个字节的字符?看来这取决于实现方式,但是我想知道正确的方法是什么.
My question is what happens in conversion from wstring to string or wchar to char ? Does the characters which require more than one byte is skipped? It seems it depends on implementation but I want to know what is the correct way of doing it.
还需要wchar来存储unicode字符吗?据我了解,UNICODE字符也可以存储在普通字符串中.为什么要使用wstring或wchar?
Also does wchar is required to store unicode characters ? As far as I understood UNICODE characters can be stored in normal string as well. Why should we use wstring or wchar ?
推荐答案
取决于如何转换它们.
您需要指定源编码类型和目标编码类型.
wstring
不是一种格式,它只是定义一种数据类型.
Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring
is not a format, it just defines a data type.
现在通常当人们说"Unicode"时,其含义是UTF16
,这是 Microsoft Windows 所使用的,而通常是wstring
所包含的内容.
Now usually when one says "Unicode", one means UTF16
which is what Microsoft Windows uses, and that is usuasly what wstring
contains.
因此,从UTF8转换为UTF16的正确方法:
So, the right way to convert from UTF8 to UTF16:
std::string utf8String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::wstring utf16String = convert.from_bytes( utf8String );
反之亦然:
std::wstring utf16String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::string utf16String = convert.to_bytes( utf16String );
并增加混乱:
在 windows 平台上使用std::string
时(例如,使用多字节编译时),它不是 UTF8 .他们使用 ANSI .
更具体地说,是Windows使用的默认编码语言.
And to add to the confusion:
When you use std::string
on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.
使用Unicode编译时,Windows API命令需要以下格式:
When compiling in Unicode the windows API commands expect these formats:
命令 A -多字节-ANSI
命令 W - Unicode -UTF16
CommandA - multibyte - ANSI
CommandW - Unicode - UTF16
这篇关于wstring字符是Unicode吗?转换期间会发生什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!