wstring字符是Unicode吗?转换期间会发生什么? [英] Is wstring character is Unicode ? What happens during conversion?

查看：740 发布时间：2020/7/13 5:01:32 c++ string unicode encoding utf-8

本文介绍了wstring字符是Unicode吗?转换期间会发生什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近一次，我遇到了UTF-8编码到字符串反之亦然的转换.我知道UTF-8编码用于保存世界上几乎所有字符，而使用内置于字符串数据类型的char只能存储ASCII值.对于UTF-8编码的字符，所需字节数内存中的字节从1个字节到4个字节不等，但对于字符"类型，通常为1个字节.

Recent times I am coming across the conversion of UTF-8 encoding to string and vice vera. I understood that UTF-8 encoding is used to hold almost all the characters in the world while using char which is built in data type for string, only ASCII values can be stored.For a character in UTF-8 encoding the number of bytes required in memory is varied from one byte to 4 bytes but for 'char' type it is usually 1 byte.

我的问题是从wstring转换为string或从wchar转换为char会发生什么? 是否会跳过需要多个字节的字符?看来这取决于实现方式，但是我想知道正确的方法是什么.

My question is what happens in conversion from wstring to string or wchar to char ? Does the characters which require more than one byte is skipped? It seems it depends on implementation but I want to know what is the correct way of doing it.

还需要wchar来存储unicode字符吗?据我了解，UNICODE字符也可以存储在普通字符串中.为什么要使用wstring或wchar?

Also does wchar is required to store unicode characters ? As far as I understood UNICODE characters can be stored in normal string as well. Why should we use wstring or wchar ?

推荐答案

取决于如何转换它们.
您需要指定源编码类型和目标编码类型.
wstring不是一种格式，它只是定义一种数据类型.

Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring is not a format, it just defines a data type.

现在通常当人们说"Unicode"时，其含义是UTF16，这是 Microsoft Windows 所使用的，而通常是wstring所包含的内容.

Now usually when one says "Unicode", one means UTF16 which is what Microsoft Windows uses, and that is usuasly what wstring contains.

因此，从UTF8转换为UTF16的正确方法:

So, the right way to convert from UTF8 to UTF16:

     std::string utf8String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::wstring utf16String = convert.from_bytes( utf8String );

反之亦然:

     std::wstring utf16String = "blah blah";

     std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
     std::string utf16String = convert.to_bytes( utf16String );

并增加混乱:
在 windows 平台上使用std::string时(例如，使用多字节编译时)，它不是 UTF8 .他们使用 ANSI .
更具体地说，是Windows使用的默认编码语言.

And to add to the confusion:
When you use std::string on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.

使用Unicode编译时，Windows API命令需要以下格式:

When compiling in Unicode the windows API commands expect these formats:

命令 A -多字节-ANSI
命令 W - Unicode -UTF16

CommandA - multibyte - ANSI
CommandW - Unicode - UTF16

这篇关于wstring字符是Unicode吗?转换期间会发生什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

wstring字符是Unicode吗?转换期间会发生什么? [英] Is wstring character is Unicode ? What happens during conversion?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

wstring字符是Unicode吗?转换期间会发生什么? [英] Is wstring character is Unicode ? What happens during conversion?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭