char vs wchar_t vs char16_t和char32_t（c ++ 11） [英] char vs wchar_t vs char16_t vs char32_t (c++11)

查看：700 发布时间：2016/10/22 18:59:43 c++ c++11

本文介绍了char vs wchar_t vs char16_t和char32_t（c ++ 11）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从我的理解， char 是安全的ASCII字符，而 char16_t 和 char32_t 可以安全地容纳unicode中的字符，一个用于16位品种，另一个用于32位品种（我应该说a而不是the吗？但我仍然想知道 wchar_t 背后的目的是什么。我应该在新代码中使用该类型，还是仅仅支持旧代码？在旧代码中 wchar_t 的目的是什么，如果从我的理解，它的大小不能保证大于一个 char ？澄清会很好！

From what I understand, a char is safe to house ASCII characters whereas char16_t and char32_t are safe to house characters from unicode, one for the 16-bit variety and another for the 32-bit variety (Should I have said "a" instead of "the"?). But I'm then left wondering what the purpose behind the wchar_t is. Should I ever use that type in new code, or is it simply there to support old code? What was the purpose of wchar_t in old code if, from what I understand, its size had no guarantee to be bigger than a char? Clarification would be nice!

推荐答案

char 是用于8位代码单位， char16_t 用于16位代码单位， char32_t 用于32位代码单位。任何这些都可以用于'Unicode'; UTF-8使用8位代码单位，UTF-16使用16位代码单位，UTF-32使用32位代码单位。

char is for 8-bit code units, char16_t is for 16-bit code units, and char32_t is for 32-bit code units. Any of these can be used for 'Unicode'; UTF-8 uses 8-bit code units, UTF-16 uses 16-bit code units, and UTF-32 uses 32-bit code units.

对 wchar_t 的保证是，语言环境中支持的任何字符都可以从 char 到 wchar_t ，以及用于 char 的任何表示形式，无论是多字节，移位代码， wchar_t 将是一个单独的值。这样做的目的是，你可以操作 wchar_t 字符串，就像使用ASCII的简单算法。

The guarantee made for wchar_t was that any character supported in a locale could be converted from char to wchar_t, and whatever representation was used for char, be it multiple bytes, shift codes, what have you, the wchar_t would be a single, distinct value. The purpose of this was that then you could manipulate wchar_t strings just like the simple algorithms used with ASCII.

例如，将ascii转换为大写格式如下：

For example, converting ascii to upper case goes like:

auto loc = std::locale("");

char s[] = "hello";
for (char &c : s) {
  c = toupper(c, loc);
}

但是这不会处理将UTF-8中的所有字符转换为大写，或者像Shift-JIS等其他一些编码中的所有字符。人们希望能够将此代码国际化，例如：

But this won't handle converting all characters in UTF-8 to uppercase, or all characters in some other encoding like Shift-JIS. People wanted to be able to internationalize this code like so:

auto loc = std::locale("");

wchar_t s[] = L"hello";
for (wchar_t &c : s) {
  c = toupper(c, loc);
}

因此每个 wchar_t 是一个'字符'，如果它有一个大写版本，那么它可以直接转换。不幸的是，这不是真的工作的所有时间;例如，在一些语言中存在异常，例如德国字母ß，其中大写版本实际上是两个字符SS而不是单个字符。

So every wchar_t is a 'character' and if it has an uppercase version then it can be directly converted. Unfortunately this doesn't really work all the time; For example there exist oddities in some languages such as the German letter ß where the uppercase version is actually the two characters SS instead of a single character.

因此，国际化的文本处理本质上比ASCII更难，并且不能真正地以 wchar_t 的设计者的方式来简化。因为 wchar_t 和宽字符通常提供的价值不大。

So internationalized text handling is intrinsically harder than ASCII and cannot really be simplified in the way the designers of wchar_t intended. As such wchar_t and wide characters in general provide little value.

使用它们的唯一原因是，已被烘焙到一些API和平台。但是，我喜欢在我自己的代码中坚持使用UTF-8，即使在这样的平台上开发，并且只需要在API边界处转换为需要的任何编码。

The only reason to use them is that they've been baked into some APIs and platforms. However, I prefer to stick to UTF-8 in my own code even when developing on such platforms, and to just convert at the API boundaries to whatever encoding is required.

这篇关于char vs wchar_t vs char16_t和char32_t（c ++ 11）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

char vs wchar_t vs char16_t和char32_t（c ++ 11） [英] char vs wchar_t vs char16_t vs char32_t (c++11)

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

char vs wchar_t vs char16_t和char32_t（c ++ 11） [英] char vs wchar_t vs char16_t vs char32_t (c++11)

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭