上次使用字符类型写入时使用非字符类型读取对象时的未定义行为 [英] Undefined behavior on reading object using non-character type when last written using character type

查看:71
本文介绍了上次使用字符类型写入时使用非字符类型读取对象时的未定义行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设unsigned int没有陷阱表示,执行下面标记为(A)和(B)的语句中的一个或两个都会引发未定义的行为,为什么或为什么不这样做,以及(尤其是如果您认为其中一个是定义明确的)但另一个不是),您是否认为该标准存在缺陷?我主要对C标准的当前版本(即C2011)感兴趣,但是如果在旧版本的标准或C ++中有所不同,我也想知道这一点.

Assuming unsigned int has no trap representations, do either or both of the statements marked (A) and (B) below provoke undefined behavior, why or why not, and (especially if you think one of them is well-defined but the other isn't), do you consider that a defect in the standard? I am primarily interested in the current version of the C standard (i.e. C2011), but if this is different in older versions of the standard, or in C++, I would also like to know about that.

(在该程序中使用_Alignas来消除由于对齐不充分而导致的UB问题.尽管我在解释中讨论的规则没有提及对齐问题.)

(_Alignas is used in this program to eliminate any question of UB due to inadequate alignment. The rules I discuss in my interpretation, though, say nothing about alignment.)

#include <stdlib.h>
#include <string.h>

int main(void)
{
    unsigned int v1, v2;
    unsigned char _Alignas(unsigned int) b1[sizeof(unsigned int)];
    unsigned char *b2 = malloc(sizeof(unsigned int));

    if (!b2) return 1;

    memset(b1, 0x55, sizeof(unsigned int));
    memset(b2, 0x55, sizeof(unsigned int));

    v1 = *(unsigned int *)b1; /* (A) */
    v2 = *(unsigned int *)b2; /* (B) */

    return !(v1 == v2);
}

我对C2011的解释是(A)引发了未定义的行为,但是(B)定义了(将未指定的值存储到v2中),因为:

My interpretation of C2011 is that (A) provokes undefined behavior but (B) is well-defined (to store an unspecified value into v2), because:

    定义
  • memset(第7.4.6.4.1节)以通过带有字符类型的左值按原样写入其第一个参数,根据特殊情况,b1b2均允许使用§6.5p7的底部.

  • memset is defined (§7.24.6.1) to write to its first argument as-if through an lvalue with character type, which is allowed for both b1 and b2 per the special case at the bottom of §6.5p7.

对象b1具有声明的类型unsigned char[n].因此,其有效访问类型也是每6.5p6 unsigned char[n].语句(A)通过类型为unsigned int的左值表达式读取b1,该值不是b1的有效类型,也不是6.5p7中的任何其他异常,因此行为未定义.

The object b1 has a declared type, unsigned char[n]. Therefore, its effective type for accesses is also unsigned char[n] per 6.5p6. Statement (A) reads b1 via an lvalue expression whose type is unsigned int, which is not the effective type of b1 nor any of the other exceptions in 6.5p7, so the behavior is undefined.

b2指向的对象没有声明的类型. (由memset)存储在其中的值(按假设)是通过字符类型的左值实现的,因此不适用于6.5p6的第二种情况.该值未从任何地方复制,因此6.5p6的第三种情况也不适用.因此,对象的有效类型是用于访问的左值的类型,即unsigned int,并且满足6.5p7的规则.

The object pointed-to by b2 has no declared type. The value stored into it (by memset) was (as-if) through an lvalue with character type, so the second case of 6.5p6 does not apply. The value was not copied from anywhere, so the third case of 6.5p6 does not apply either. Therefore, the effective type of the object is the type of the lvalue used for the access, which is unsigned int, and the rules of 6.5p7 are satisfied.

最后,根据6.2.6.1,假定unsigned int没有陷阱表示,memset操作已在b1b2的每一个中创建了一些未指定的unsigned int值的表示.因此,如果(A)和(B)都不引起未定义的行为,则v1v2中的实际值未指定,但它们相等.

Finally, per 6.2.6.1, assuming unsigned int has no trap representations, the memset operation has created the representation of some unspecified unsigned int value in each of b1 and b2. Therefore, if neither (A) nor (B) provokes undefined behavior, then the actual values in v1 and v2 are unspecified but they are equal.

评论:

基于类型的别名"规则(即6.5p7)的不对称性,允许具有字符类型的左值访问任何有效类型的对象,反之亦然,这一直是造成混乱的根源.似乎已专门添加了6.5p6的第二种情况,以防止其被未定义的行为读取由memset(或就此而言,为calloc)初始化的值,但是,因为它仅适用于没有声明类型的对象本身就是造成混乱的另一个原因.

The asymmetry of the "type-based aliasing" rules (that is, 6.5p7), permitting an object with any effective type to be accessed by an lvalue with character type, but not vice versa, is a continual source of confusion. The second case of 6.5p6 seems to have been added specifically to prevent its being undefined behavior to read a value initialized by memset (or, for that matter, calloc) but, because it only applies to objects with no declared type, is itself an additional source of confusion.

推荐答案

该标准的作者从原理上承认,实现可能是符合标准的,但却是无用的.因为他们希望实现者会努力使实现变得有用,所以他们认为没有必要强制要求使实现适合于任何特定目的的所有行为.

The authors of the Standard acknowledge in the rationale that it would be possible for an implementation to be conforming but useless. Because they expected that implementers would endeavor to make their implementations useful, they didn't think it necessary to mandate every behavior that might be needed to make an implementation suitable for any particular purpose.

该标准对访问字符数组类型的对齐对象以及某些其他类型的代码的行为没有任何要求.这并不意味着他们打算实现不应该将数组视为无类型存储,而不是在代码只占用一次数组地址却从不直接访问它的情况下执行其他操作.别名的基本性质是,它要求以两种不同的方式访问项目.如果仅以一种方式访问​​对象,那么根据定义就没有别名.在char[]仅用作无类型存储(标准是否要求以及其 )的情况下,任何适用于低级编程的质量实现都应以有用的方式表现.很难想象这种治疗会阻碍任何有用的目的.拥有标准授权这样的行为,唯一的目的就是防止编译器作者将缺少授权视为不以明显有用的方式处理此类代码的理由.

The Standard imposes no requirements on the behavior of code that accesses an aligned object of character-array type as some other type. That doesn't mean that they intended that implementations should do something other than treat the array as untyped storage in cases where code takes the address of the array once but never accesses it directly. The fundamental nature of aliasing is that it requires that an item be accessed in two different ways; if an object is only ever accessed one way, there is by definition no aliasing. Any quality implementation which is supposed to be suitable for low-level programming should behave in useful fashion in cases where a char[] is used only as untyped storage, whether the Standard requires it or not, and its hard to imagine any useful purpose that would be impeded by such treatment. The only purpose that would be served by having the Standard mandate such behavior would be to prevent compiler writers from treating the lack of a mandate as being--in and of itself--a reason not to process such code in the obvious useful fashion.

这篇关于上次使用字符类型写入时使用非字符类型读取对象时的未定义行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆