'\\\க'是一个多字符字面值吗? [英] Is '\u0B95' a multicharacter literal?

查看:259
本文介绍了'\\\க'是一个多字符字面值吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前提供的答案中,我回复了以下警告,这是由于'\\\க'需要三个字节,因此是多字符文字:

 code>警告:多字符字符常量[-Wmultichar] 

但实际上,我认为我是对的,我不认为gcc是。标准状态:


包含多个c-char 的普通字符文字是 -character-name (即 \uXXXX \UXXXXXXXX )。由于 \\\க 是单个 c-char ,因此这不是多字符字面值。但现在它变得凌乱。该标准还说:


包含单个 c-char 的普通字符文字类型 char ,其值等于执行字符集中 c-char 编码的数值。


< blockquote>

所以我的文本类型是 char 和执行字符集中字符的值不存在于该集合中)。 char 只定义为足够大以存储基本字符集的任何成员(这实际上不是由标准定义的,但我认为它意味着基本执行字符集):


声明为字符(char)的对象必须足够大以存储实现的基本字符集的任何成员。


因此,由于执行字符集是所有值的超集a char ,我的字符可能不适合在 char



char 有吗?这似乎没有在任何地方定义。标准确实说,对于 char16_t 文字,如果值不可表示,程序是不成形的。它没有说到普通文字。



那么发生了什么事?这是标准的混乱还是我缺少的东西?

解决方案

你是正确的,根据规格'\\\க'是一个字符型字符文字,其值等于执行字符集中的字符编码。你是对的,规范没有说什么关于这是不可能的char字面量,由于单个字符无法表示该值的情况。



有关于此问题的委员会提交的缺陷报告:例如 http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#912



目前提出的解决方案似乎是指定这些字符文字也 int s并且具有实现定义的值(虽然建议的语言不是很正确的),就像多字面字面量。我不是那个解决方案的粉丝,我认为一个更好的解决方案是说这样的文字是不成形的。



这是在clang中实现: http://coliru.stacked-crooked.com/a/952ce7775dcf7472


In a previous answer I gave, I responded to the following warning being caused by the fact that '\u0B95' requires three bytes and so is a multicharacter literal:

warning: multi-character character constant [-Wmultichar]

But actually, I don't think I'm right and I don't think gcc is either. The standard states:

An ordinary character literal that contains more than one c-char is a multicharacter literal.

One production rule for c-char is a universal-character-name (i.e. \uXXXX or \UXXXXXXXX). Since \u0B95 is a single c-char, this is not a multicharacter literal. But now it gets messy. The standard also says:

An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.

So my literal has type char and value of the character in the execution character set (or implementation-defined value if it does not exist in that set). char is only defined to be large enough to store any member of the basic character set (which is not actually defined by the standard, but I assume it means the basic execution character set):

Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set.

Therefore, since the execution character set is a superset of all the values a char can hold, my character may not fit in the char.

So what value does my char have? This doesn't seem to be defined anywhere. The standard does say that for char16_t literals, if the value is not representable, the program is ill-formed. It says nothing about ordinary literals, though.

So what's going on? Is this just a mess in the standard or am I missing something?

解决方案

You are correct, according to the spec '\u0B95' is a char-typed character literal with a value equal to the character's encoding in the execution character set. And you're right that the spec doesn't say anything about the case where this is not possible for char literals due to a single char being unable to represent that value. The behavior is undefined.

There are defect reports filed with the committee on this issue: E.g., http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#912

The currently proposed resolution seems to be to specify that these character literals are also ints and have implementation defined values (although the proposed language isn't quite right for that), just like multichar literals. I'm not a fan of that solution, and I think a better solution is to say such literals are ill-formed.

This is what's implemented in clang: http://coliru.stacked-crooked.com/a/952ce7775dcf7472

这篇关于'\\\க'是一个多字符字面值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆