char vs wchar_t vs char16_t和char32_t(c ++ 11) [英] char vs wchar_t vs char16_t vs char32_t (c++11)

查看:700
本文介绍了char vs wchar_t vs char16_t和char32_t(c ++ 11)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我的理解, char 是安全的ASCII字符,而 char16_t char32_t 可以安全地容纳unicode中的字符,一个用于16位品种,另一个用于32位品种(我应该说a而不是the吗?但我仍然想知道 wchar_t 背后的目的是什么。我应该在新代码中使用该类型,还是仅仅支持旧代码?在旧代码中 wchar_t 的目的是什么,如果从我的理解,它的大小不能保证大于一个 char ?澄清会很好!

From what I understand, a char is safe to house ASCII characters whereas char16_t and char32_t are safe to house characters from unicode, one for the 16-bit variety and another for the 32-bit variety (Should I have said "a" instead of "the"?). But I'm then left wondering what the purpose behind the wchar_t is. Should I ever use that type in new code, or is it simply there to support old code? What was the purpose of wchar_t in old code if, from what I understand, its size had no guarantee to be bigger than a char? Clarification would be nice!

推荐答案

char 是用于8位代码单位, char16_t 用于16位代码单位, char32_t 用于32位代码单位。任何这些都可以用于'Unicode'; UTF-8使用8位代码单位,UTF-16使用16位代码单位,UTF-32使用32位代码单位。

char is for 8-bit code units, char16_t is for 16-bit code units, and char32_t is for 32-bit code units. Any of these can be used for 'Unicode'; UTF-8 uses 8-bit code units, UTF-16 uses 16-bit code units, and UTF-32 uses 32-bit code units.

wchar_t 的保证是,语言环境中支持的任何字符都可以从 char wchar_t ,以及用于 char 的任何表示形式,无论是多字节,移位代码, wchar_t 将是一个单独的值。这样做的目的是,你可以操作 wchar_t 字符串,就像使用ASCII的简单算法。

The guarantee made for wchar_t was that any character supported in a locale could be converted from char to wchar_t, and whatever representation was used for char, be it multiple bytes, shift codes, what have you, the wchar_t would be a single, distinct value. The purpose of this was that then you could manipulate wchar_t strings just like the simple algorithms used with ASCII.

例如,将ascii转换为大写格式如下:

For example, converting ascii to upper case goes like:

auto loc = std::locale("");

char s[] = "hello";
for (char &c : s) {
  c = toupper(c, loc);
}

但是这不会处理将UTF-8中的所有字符转换为大写,或者像Shift-JIS等其他一些编码中的所有字符。人们希望能够将此代码国际化,例如:

But this won't handle converting all characters in UTF-8 to uppercase, or all characters in some other encoding like Shift-JIS. People wanted to be able to internationalize this code like so:

auto loc = std::locale("");

wchar_t s[] = L"hello";
for (wchar_t &c : s) {
  c = toupper(c, loc);
}

因此每个 wchar_t 是一个'字符',如果它有一个大写版本,那么它可以直接转换。不幸的是,这不是真的工作的所有时间;例如,在一些语言中存在异常,例如德国字母ß,其中大写版本实际上是两个字符SS而不是单个字符。

So every wchar_t is a 'character' and if it has an uppercase version then it can be directly converted. Unfortunately this doesn't really work all the time; For example there exist oddities in some languages such as the German letter ß where the uppercase version is actually the two characters SS instead of a single character.

因此,国际化的文本处理本质上比ASCII更难,并且不能真正地以 wchar_t 的设计者的方式来简化。因为 wchar_t 和宽字符通常提供的价值不大。

So internationalized text handling is intrinsically harder than ASCII and cannot really be simplified in the way the designers of wchar_t intended. As such wchar_t and wide characters in general provide little value.

使用它们的唯一原因是,已被烘焙到一些API和平台。但是,我喜欢在我自己的代码中坚持使用UTF-8,即使在这样的平台上开发,并且只需要在API边界处转换为需要的任何编码。

The only reason to use them is that they've been baked into some APIs and platforms. However, I prefer to stick to UTF-8 in my own code even when developing on such platforms, and to just convert at the API boundaries to whatever encoding is required.

这篇关于char vs wchar_t vs char16_t和char32_t(c ++ 11)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆