C ++ 0x中的新unicode字符 [英] New unicode characters in C++0x
问题描述
我正在建立一个API,允许我以各种编码提取字符串,包括utf8,utf16,utf32和wchar_t(根据操作系统可能是utf32或utf16)。
-
新C ++标准引入了
char16_t
和char32_t
但是问题是,会干扰与正常的uint16_t code>,
uint32_t
,wchar_t
类型不允许重载,因为它们可能引用同一类型?class some_class {
public:
void set(std :: string); // utf8 string
void set(std :: wstring); // wchar string utf16 or utf32根据
//到sizeof(wchar_t)
void set(std :: basic_string< uint16_t>)
// wchar independent utf16 string
void set (std :: basic_string< uint32_t>);
// wchar independent utf32 string
#ifdef HAVE_NEW_UNICODE_CHARRECTERS
void set(std :: basic_string< char16_t>)
// new standard utf16 string
void set(std :: basic_string< char32_t>);
// new standard utf32 string
#endif
};
所以我可以写:
foo.set(USome utf32 String);
foo.set(uSome utf16 string);
-
std :: basic_string< char16_t&
和std :: basic_string< char32_t>
:typedef basic_string< wchar_t> wstring。
我找不到任何参考。
Edit:根据gcc-4.4的标题,引入了这些新类型:
typedef basic_string< char16_t> u16string;
typedef basic_string< char32_t> u32string;
我只是想确保这是实际的标准要求,而不是gcc-ism。
New C++ standard had introduced new types
char16_t
andchar32_t
that do not have this sizeof ambiguity and should be used in future, so I would like to support them as well, but the question is, would they interfere with normaluint16_t
,uint32_t
,wchar_t
types not allowing overload because they may refer to same type?class some_class { public: void set(std::string); // utf8 string void set(std::wstring); // wchar string utf16 or utf32 according // to sizeof(wchar_t) void set(std::basic_string<uint16_t>) // wchar independent utf16 string void set(std::basic_string<uint32_t>); // wchar independent utf32 string #ifdef HAVE_NEW_UNICODE_CHARRECTERS void set(std::basic_string<char16_t>) // new standard utf16 string void set(std::basic_string<char32_t>); // new standard utf32 string #endif };
So I can just write:
foo.set(U"Some utf32 String"); foo.set(u"Some utf16 string");
What are the typedef of
std::basic_string<char16_t>
andstd::basic_string<char32_t>
as there is today:typedef basic_string<wchar_t> wstring.
I can't find any reference.
Edit: according to headers of gcc-4.4, that introduced these new types:
typedef basic_string<char16_t> u16string; typedef basic_string<char32_t> u32string;
I just want to make sure that this is actual standard requirement and not gcc-ism.
1) char16_t
和 char32_t
将是不同的新类型,因此可以对它们进行重载。
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2018.html\">ISO/IEC JTC1 SC22 WG21 N2018 :
定义
char16_t
为
distinct新类型的typedef,名称为
_Char16_t
与uint_least16_t
具有相同的大小和表示。
同样,定义char32_t
为
typedef到一个不同的新类型,
名称_Char32_t
与
uint_least32_t
具有相同的
大小和表示。
进一步说明(摘自devx.com文章为Unicode革命做准备 ):
您可能想知道为什么
_Char16_t $当$ typedefs
uint_least16_t
时,需要code>和_Char32_t
uint_least32_t
已可用。
新类型
解决的主要问题是重载。现在
可能重载
采用_Char16_t
和_Char32_t
参数,并创建与
std不同的专用化
。
,例如std :: basic_string< _Char16_t>
:basic_string< wchar_t>
2) u16string
和 u32string
确实是C ++ 0x的一部分,而不仅仅是GCC主题,因为它们在各种标准草案。它们将包含在新的< string>
头中。同一篇文章的引用:
标准库还将提供
_Char16_t
和_Char32_t
typedefs,类似于typedefswstring
,
wcout
等,用于以下标准类:
filebuf,streambuf,streampos,streamoff,ios,istream, ostream,fstream,
ifstream,ofstream,stringstream,istringstream,ostringstream,string
I'm buiding an API that allows me to fetch strings in various encodings, including utf8, utf16, utf32 and wchar_t (that may be utf32 or utf16 according to OS).
1) char16_t
and char32_t
will be distinct new types, so overloading on them will be possible.
Quote from ISO/IEC JTC1 SC22 WG21 N2018:
Define
char16_t
to be a typedef to a distinct new type, with the name_Char16_t
that has the same size and representation asuint_least16_t
. Likewise, definechar32_t
to be a typedef to a distinct new type, with the name_Char32_t
that has the same size and representation asuint_least32_t
.
Further explanation (from a devx.com article "Prepare Yourself for the Unicode Revolution"):
You're probably wondering why the
_Char16_t
and_Char32_t
types and keywords are needed in the first place when the typedefsuint_least16_t
anduint_least32_t
are already available. The main problem that the new types solve is overloading. It's now possible to overload functions that take_Char16_t
and_Char32_t
arguments, and create specializations such asstd::basic_string<_Char16_t>
that are distinct fromstd::basic_string <wchar_t>
.
2) u16string
and u32string
are indeed part of C++0x and not just GCC'isms, as they are mentioned in various standard draft papers. They will be included in the new <string>
header. Quote from the same article:
The Standard Library will also provide
_Char16_t
and_Char32_t
typedefs, in analogy to the typedefswstring
,wcout
, etc., for the following standard classes:
filebuf, streambuf, streampos, streamoff, ios, istream, ostream, fstream, ifstream, ofstream, stringstream, istringstream, ostringstream,
string
这篇关于C ++ 0x中的新unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!