是无符号字符 a[4][5];[1][7];未定义的行为? [英] Is unsigned char a[4][5]; a[1][7]; undefined behavior?

查看:35
本文介绍了是无符号字符 a[4][5];[1][7];未定义的行为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C 标准读取 (J.2) 中未定义行为的示例之一:

One of the examples of undefined behavior from the C standard reads (J.2):

——数组下标超出范围,即使对象显然可以通过给定下标(如左值表达式 a[1][7] 给定声明 inta[4][5]) (6.5.6)

— An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6)

如果声明从int a[4][5]改为unsigned char a[4][5],是否访问a[1][7] 仍然导致未定义的行为?我的观点是它没有,但我从其他人那里听到了不同意的声音,我想看看其他一些潜在的 SO 专家是怎么想的.

If the declaration is changed from int a[4][5] to unsigned char a[4][5], does accessing a[1][7] still result in undefined behavior? My opinion is that it does not, but I have heard from others who disagree, and I'd like to see what some other would-be experts on SO think.

我的推理:

  • 按照6.2.6.1第4段和6.5第7段的通常解释,对象a的表示为sizeof(unsigned char [4][5])*CHAR_BIT 位并且可以作为与对象重叠的 unsigned char [20] 类型的数组访问.

  • By the usual interpretation of 6.2.6.1 paragraph 4, and 6.5 paragraph 7, the representation of the object a is sizeof (unsigned char [4][5])*CHAR_BIT bits and can be accessed as an array of type unsigned char [20] overlapped with the object.

a[1] 具有类型 unsigned char [5] 作为左值,但在表达式中使用(作为 [] 运算符,或等效地作为 *(a[1]+7)) 中 + 运算符的操作数,它会衰减为类型 <代码>无符号字符*.

a[1] has type unsigned char [5] as an lvalue, but used in an expression (as an operand to the [] operator, or equivalently as an operand to the + operator in *(a[1]+7)), it decays to a pointer of type unsigned char *.

a[1] 的值也是一个指向a表示"的字节的指针,形式为unsigned char [20].这样解释,a[1]加7是有效的.

The value of a[1] is also a pointer to a byte of the "representation" of a in the form unsigned char [20]. Interpreted in this way, adding 7 to a[1] is valid.

推荐答案

想要编写符合标准的编译器的编译器供应商必须遵守标准的规定,而不是您的推理.标准说超出范围的数组下标是未定义的行为,没有任何例外,因此允许编译器爆炸.

A compiler vendor who wants to write a conforming compiler is bound to what the Standard has to say, but not to your reasoning. The Standard says that an array subscript out of range is undefined behaviour, without any exception, so the compiler is allowed to blow up.

引用我们上次讨论中的评论(C99 是否保证数组是连续?)

To cite my comment from our last discussion (Does C99 guarantee that arrays are contiguous?)

您最初的问题是针对 a[0][6],声明为 char a[5][5].无论如何,这都是 UB. 使用 char *p = &a[3][4]; 和访问 p[0]p[5]. 取地址 &p[6] 仍然有效,但访问 p[6] 在对象之外,因此是 UB.访问 a[0][6] 在对象 a[0] 之外,它的类型是 array[5] 字符.结果的类型无关紧要,重要的是你如何达到它."

"Your original question was for a[0][6], with the declaration char a[5][5]. This is UB, no matter what. It is valid to use char *p = &a[3][4]; and access p[0] to p[5]. Taking the address &p[6] is still valid, but accessing p[6] is outside of the object, thus UB. Accessing a[0][6] is outside of the object a[0], which has type array[5] of chars. The type of the result is irrelevant, it is important how you reach it."

有足够多的未定义行为案例,您必须浏览整个标准,收集事实并将它们结合起来,最终得出未定义行为的结论.这是明确,您甚至在问题中引用了标准中的句子.它是明确的,没有任何解决方法的空间.

There are enough cases of undefined behaviour where you have to scan through the whole Standard, collect the facts and combine them to finally get to the conclusion of undefined behaviour. This one is explicit, and you even cite the sentence from the Standard in your question. It is explicit and leaves no space for any workarounds.

我只是想知道您希望我们在推理上有多明确,才能确信它确实是 UB?

I'm just wondering how much more explicitness in reasoning do you expect from us to become convinced that it really is UB?

编辑 2:

在挖掘标准并收集信息后,这里是另一个相关引文:

After digging through the Standard and collecting information, here is another relevant citation:

6.3.2.1 - 3:除非它是 sizeof 运算符的操作数或一元&运算符,或者是一个字符串用于初始化数组的文字,类型为数组"的表达式type'' 转换为表达式使用类型指向类型的指针"指向初始元素数组对象并且不是左值.如果数组对象有寄存器存储类,行为未定义.

6.3.2.1 - 3: Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

所以我认为这是有效的:

So I think this is valid:

unsigned char *p = a[1]; 
unsigned char c = p[7]; // Strict aliasing not applied for char types

这是 UB:

unsigned char c = a[1][7];

因为 a[1] 在这一点上不是左值,而是进一步评估,违反了 J.2 数组下标超出范围.真正发生的事情应该取决于编译器如何在多维数组中实际实现数组索引.所以你可能是对的,它对每个已知的实现都没有任何影响.但这也是一个有效的未定义行为.;)

Because a[1] is not an lvalue at this point, but evaluated further, violating J.2 with an array subscript out of range. What really happens should depend on how the compiler actually implements the array indexing in multidimensional arrays. So you may be right that it doesn't make any difference on every known implementation. But that's a valid undefined behaviour, too. ;)

这篇关于是无符号字符 a[4][5];[1][7];未定义的行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆