在C ++ 11中是否需要u8字符串文字 [英] Is the u8 string literal necessary in C++11
问题描述
来自维基百科:
为了增强对C ++编译器中Unicode的支持,已将char类型的定义修改为至少存储8位UTF-8编码所需的大小.
For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be at least the size necessary to store an eight-bit coding of UTF-8.
我想知道这对于编写可移植应用程序到底意味着什么.写这个有什么区别
I'm wondering what exactly this means for writing portable applications. Is there any difference between writing this
const char[] str = "Test String";
还是这个?
const char[] str = u8"Test String";
是否有任何理由不对代码中的每个字符串文字都使用后者?
Is there be any reason not to use the latter for every string literal in your code?
当TestString中包含非ASCII字符时会发生什么?
What happens when there are non-ASCII-Characters inside the TestString?
推荐答案
"Test String"
的编码是实现定义的系统编码(窄的,可能是多字节的).
The encoding of "Test String"
is the implementation-defined system encoding (the narrow, possibly multibyte one).
u8"Test String"
的编码始终为UTF-8.
The encoding of u8"Test String"
is always UTF-8.
这些例子并不能说明问题.如果您在字符串中包含一些Unicode文字(例如\U0010FFFF
),那么您将始终获得这些(编码为UTF-8),但是它们是否可以在系统编码的字符串中表示,如果是,则它们的值是多少会是实现定义的.
The examples aren't terribly telling. If you included some Unicode literals (such as \U0010FFFF
) into the string, then you would always get those (encoded as UTF-8), but whether they could be expressed in the system-encoded string, and if yes what their value would be, is implementation-defined.
如果有帮助,请想象您正在EBCDIC机器上编写源代码.然后,文字测试字符串"始终在源文件本身中进行EBCDIC编码,但是u8
初始化的数组包含UTF-8编码的值,而第一个数组包含EBCDIC编码的值.
If it helps, imagine you're authoring the source code on an EBCDIC machine. Then the literal "Test String" is always EBCDIC-encoded in the source file itself, but the u8
-initialized array contains UTF-8 encoded values, whereas the first array contains EBCDIC-encoded values.
这篇关于在C ++ 11中是否需要u8字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!