在C ++ 11中是否需要u8字符串文字 [英] Is the u8 string literal necessary in C++11

查看:318
本文介绍了在C ++ 11中是否需要u8字符串文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自维基百科:

为了增强对C ++编译器中Unicode的支持,已将char类型的定义修改为至少存储8位UTF-8编码所需的大小.

For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be at least the size necessary to store an eight-bit coding of UTF-8.

我想知道这对于编写可移植应用程序到底意味着什么.写这个有什么区别

I'm wondering what exactly this means for writing portable applications. Is there any difference between writing this

const char[] str = "Test String";

还是这个?

const char[] str = u8"Test String";

是否有任何理由不对代码中的每个字符串文字都使用后者?

Is there be any reason not to use the latter for every string literal in your code?

当TestString中包含非ASCII字符时会发生什么?

What happens when there are non-ASCII-Characters inside the TestString?

推荐答案

"Test String"的编码是实现定义的系统编码(窄的,可能是多字节的).

The encoding of "Test String" is the implementation-defined system encoding (the narrow, possibly multibyte one).

u8"Test String"的编码始终为UTF-8.

The encoding of u8"Test String" is always UTF-8.

这些例子并不能说明问题.如果您在字符串中包含一些Unicode文字(例如\U0010FFFF),那么您将始终获得这些(编码为UTF-8),但是它们是否可以在系统编码的字符串中表示,如果是,则它们的值是多少会是实现定义的.

The examples aren't terribly telling. If you included some Unicode literals (such as \U0010FFFF) into the string, then you would always get those (encoded as UTF-8), but whether they could be expressed in the system-encoded string, and if yes what their value would be, is implementation-defined.

如果有帮助,请想象您正在EBCDIC机器上编写源代码.然后,文字测试字符串"始终在源文件本身中进行EBCDIC编码,但是u8初始化的数组包含UTF-8编码的值,而第一个数组包含EBCDIC编码的值.

If it helps, imagine you're authoring the source code on an EBCDIC machine. Then the literal "Test String" is always EBCDIC-encoded in the source file itself, but the u8-initialized array contains UTF-8 encoded values, whereas the first array contains EBCDIC-encoded values.

这篇关于在C ++ 11中是否需要u8字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆