您可以通过char *访问任何对象的对象表示吗? [英] Can you access the object representation of any object through a char*?

查看:47
本文介绍了您可以通过char *访问任何对象的对象表示吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶然发现了 reddit线程,其中用户发现了C ++标准的一个有趣的细节.该主题并没有引发太多建设性的讨论,因此在这里我将重新表达对问题的理解:

I have stumbled upon a reddit thread in which a user has found an interesting detail of the C++ standard. The thread has not spawned much constructive discussion, therefore I will retell my understanding of the problem here:

  • OP希望以符合标准的方式重新实现 memcpy
  • 他们试图通过使用 reinterpret_cast< char *>(& foo)来做到这一点,这是严格的别名限制的允许例外,在严格的别名限制中,重新解释为 char 被允许访问对象表示".一个对象.
  • [expr.reinterpret.cast] 表示这样做会导致 static_cast< cv T *>(static_cast< cv void *>(v)),因此在这种情况下, reinterpret_cast 等同于static_cast首先进入 void * ,然后进入 char * .
  • [expr.static.cast] [basic.compound]
  • OP wants to reimplement memcpy in a standard-compliant way
  • They attempt to do so by using reinterpret_cast<char*>(&foo), which is an allowed exception to the strict aliasing restrictions, in which reinterpreting as char is allowed to access the "object representation" of an object.
  • [expr.reinterpret.cast] says that doing so results in static_­cast<cv T*>(static_­cast<cv void*>(v)), so reinterpret_cast in this case is equivalent to static_cast'ing first to void * and then to char *.
  • [expr.static.cast] in combination with [basic.compound]

指针指向cv1无效"类型的prvalue可以转换为类型为"pointer to cv2 T"的prvalue,其中T是对象类型,而cv2是与cv-qualification相同的对象,或具有比cv1更高的cv资格.[...] 如果原始指针值指向对象a,并且存在类型T(忽略cv限定)的对象b,该对象可与a进行指针互换,则结果是指向b的指针.[...] [强调我的观点]

A prvalue of type "pointer to cv1 void" can be converted to a prvalue of type "pointer to cv2 T", where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. [...] if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. [...] [emphasis mine]

现在考虑以下联合类:

union Foo{
    char c;
    int i;
};
// the OP has used union, but iiuc,
// it can also be a struct for the problem to arise.

因此,

OP得出的结论是,在这种情况下,将 Foo * 重新解释为 char * 会产生一个指向联合的第一个char成员(或它的联合成员)的指针.对象表示形式),而不是并集本身的对象表示形式,即,它仅将指向成员.尽管从表面上看这是相同的,并且对应于相同的存储器地址,但是该标准似乎区分了值"和值"之间的区别.指针及其对应的地址,因为在抽象C ++机器上,指针仅属于某个对象.将其增加到该对象之外(与数组的end()比较)是未定义的行为.

OP has thus come to the conclusion that reinterpreting a Foo* as char* in this case yields a pointer pointing to the first char member of the union (or its object representation), rather than to the object representation of the union itself, i.e. it points only to the member. While this appears superficially to be the same, and corresponds to the same memory address, the standard seems to differentiate between the "value" of a pointer and its corresponding address, in that on the abstract C++ machine, a pointer belongs to a certain object only. Incrementing it beyond that object (compare with end() of an array) is undefined behavior.

OP认为,如果标准强制将 char * 与对象的第一个成员而不是整个联合对象的对象表示形式相关联,则在递增一次后将其取消引用是UB,这允许编译器进行优化,好像生成的 char * 不可能访问int成员的以下字节.这意味着不可能合法地访问与 char 成员可指针转换的类对象的完整对象表示形式.

OP thus argues that if the standard forces the char* to be associated with the objects's first member instead of the object representation of the whole union object, dereferencing it after one incrementation is UB, which allows a compiler to optimize as if it were impossible for the resultant char* to ever access the following bytes of the int member. This implies that it is not possible to legally access the complete object representation of a class object which is pointer-interconvertible with a char member.

如果我理解正确的话,工会"也应如此.只是简单地用"struct"代替,但是我从原始线程中提取了这个示例.

The same would, if I understand correctly apply if "union" was simply replaced with "struct", but I have taken this example from the original thread.

您怎么看?这是标准缺陷吗?是误解吗?

What do you think? Is this a standard defect? Is it a misinterpretation?

推荐答案

此视频,@ KonradRudolph的评论(现在是聊天)中的链接可能是问题的答案.

This video, linked in the comments (now chat) by @KonradRudolph is likely the answer to the problem.

ISO C ++受托人之一Timur Doumler在40分钟左右的时间里讨论了访问字节表示形式的可能性.摘要是,除 memcpy 以外的任何访问字节表示的尝试都是UB.如果不使用UB,OP中的情况甚至不会出现,因为使用指向数组之类的对象的指针或对其执行任何指针运算的行为都是UB,因为这些操作仅在处理时才定义明确就抽象机而言,实际的数组对象.

At around the 40min mark, Timur Doumler, who is a member of the ISO C++ commitee, discusses the possibility of accessing byte representations. The summary is that any attempt of accessing byte representation except memcpy is UB. The situation in the OP does not even arise without making use of UB because the very act of using a pointer to an object like an array, or doing any pointer arithmetic on it is UB, as these operations are only well-defined when dealing with actual array objects, as far as the abstract machine is concerned.

此外,虽然将指针重新解释为 char * 本身并不违反别名规则,但从技术上讲,不能保证所得的 char * 将指向对象的第一个字节.

Also, while reinterpreting a pointer as a char* does not on its own violate aliasing rules, there is technically no guarantee that the resulting char* will point to the first byte of the object.

访问字节表示形式的唯一合法方法是将对象 memcpy 转换为char数组.这意味着不可能重新实现 memcpy .

The only legal way of accessing byte representations is to memcpy the object into a char array. This means that reimplementing memcpy is impossible.

Timur Doumler还将其描述为措辞上的缺陷,有望在C ++ 23中得到解决,并提出了针对此问题的解决方案.

Timur Doumler additionally describes this as a wording defect that will hopefully be fixed in C++23 and presents a paper that proposes a fix to this.

这篇关于您可以通过char *访问任何对象的对象表示吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆