C ++ 11中支持的Unicode有多好? [英] How well is Unicode supported in C++11?
问题描述
我读过并听说C ++ 11支持Unicode。有几个问题:
I've read and heard that C++11 supports Unicode. A few questions on that:
- C ++标准库支持Unicode有多好?
-
std :: string
是否应该做什么? - 如何使用?
- 潜在问题在哪里?
- How well does the C++ standard library support Unicode?
- Does
std::string
do what it should? - How do I use it?
- Where are potential problems?
推荐答案
C ++标准库如何支持unicode?
How well does the C++ standard library support unicode?
可怕。
Unicode支持给我这个列表:
A quick scan through the library facilities that might provide Unicode support gives me this list:
- 字符串库
- 本地化库
- 输入/输出库
- 正则表达式库
- Strings library
- Localization library
- Input/output library
- Regular expressions library
第一个提供可怕的支持。
I think all but the first one provide terrible support. I'll get back to it in more detail after a quick detour through your other questions.
std: :string
做什么呢?
是的。根据C ++标准,这是 std :: string
和它的兄弟应该做的
Yes. According to the C++ standard, this is what std::string
and its siblings should do:
类模板
basic_string
描述了可以存储由不同数量的任意char样对象组成的序列的对象,序列的第一个元素位于零。
The class template
basic_string
describes objects that can store a sequence consisting of a varying number of arbitrary char-like objects with the first element of the sequence at position zero.
好吧, std :: string
这是否提供任何Unicode特定的功能?不。
Well, std::string
does that just fine. Does that provide any Unicode-specific functionality? No.
应该吗?可能不会。 std :: string
作为 char
对象的序列很好。这很有用;唯一的烦恼是,它是一个非常低级别的文本和标准C ++不提供更高级别的视图。
Should it? Probably not. std::string
is fine as a sequence of char
objects. That's useful; the only annoyance is that it is a very low-level view of text and standard C++ doesn't provide a higher-level one.
如何使用它?
How do I use it?
使用它作为 char
对象的序列;
Use it as a sequence of char
objects; pretending it is something else is bound to end in pain.
潜在的问题在哪里?
Where are potential problems?
整个地方?让我们看看...
All over the place? Let's see...
字符串库
字符串库提供给我们 basic_string
,这只是标准调用char-like objects的顺序。我称它们为代码单位。如果你想要一个高级的文本视图,这不是你要找的。这是一个适用于序列化/反序列化/存储的文本视图。
The strings library provides us basic_string
, which is merely a sequence of what the standard calls "char-like objects". I call them code units. If you want a high-level view of text, this is not what you are looking for. This is a view of text suitable for serialization/deserialization/storage.
它还提供了C库中的一些工具,可用于弥合狭窄世界和Unicode世界: c16rtomb
/ mbrtoc16
和 c32rtomb
mbrtoc32
。
It also provides some tools from the C library that can be used to bridge the gap between the narrow world and the Unicode world: c16rtomb
/mbrtoc16
and c32rtomb
/mbrtoc32
.
本地化库
本地化库仍然相信那些char-like对象之一等于一个字符。这当然是愚蠢的,并且使得不可能得到很多东西在ASCII之外的一些小的Unicode子集上工作。
The localization library still believes that one of those "char-like objects" equals one "character". This is of course silly, and makes it impossible to get lots of things working properly beyond some small subset of Unicode like ASCII.
例如,考虑标准调用< locale>
标题中的便利界面:
Consider, for example, what the standard calls "convenience interfaces" in the <locale>
header:
template <class charT> bool isspace (charT c, const locale& loc);
template <class charT> bool isprint (charT c, const locale& loc);
template <class charT> bool iscntrl (charT c, const locale& loc);
// ...
template <class charT> charT toupper(charT c, const locale& loc);
template <class charT> charT tolower(charT c, const locale& loc);
// ...
这篇关于C ++ 11中支持的Unicode有多好?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!