带有“通用初始序列"的联合“双关语"结构:为什么 C(99+)而不是 C++ 规定了“联合类型的可见声明"? [英] union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

查看:21
本文介绍了带有“通用初始序列"的联合“双关语"结构:为什么 C(99+)而不是 C++ 规定了“联合类型的可见声明"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于通过 union 的类型双关的大多数非或实现定义的性质的讨论通常引用以下位,这里通过 @ecatmur ( https://stackoverflow.com/a/31557852/2757035 ),关于标准布局 struct 的豁免具有共同初始成员类型的序列":

Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via @ecatmur ( https://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:

C11(6.5.2.3 结构和联合成员语义):

[...] 如果联合包含多个共享公共初始序列的结构(见下文),并且联合对象当前包含这些结构之一,允许检查其中任何一个的共同初始部分任何地方的声明联合的完整类型是可见的.两个结构共享一个公共初始序列,如果对应的成员对于一个或多个序列具有兼容的类型(并且对于位域,具有相同的宽度)更多初始成员.

[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

C++03 ([class.mem]/16):

如果一个 POD-union 包含两个或多个 POD-structs 共享一个共同的初始序列,并且如果 POD-union 对象当前包含一个在这些 POD 结构中,允许检查公共首字母其中任何一个的一部分.两个 POD 结构共享一个共同的初始序列如果相应的成员具有与布局兼容的类型(并且,对于位域,相同的宽度)用于一个或多个初始会员.

If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

这两个标准的其他版本有相似的语言;从 C++11 开始使用的术语是标准布局,而不是POD.

Other versions of the two standards have similar language; since C++11 the terminology used is standard-layout rather than POD.

由于不需要重新解释,这并不是真正的类型双关语,只是应用于 union 成员访问的名称替换.针对 C++17(臭名昭著的 P0137R1)的提案使用类似访问就像另一个结构成员被提名一样"这样的语言来明确说明这一点.

Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.

但请注意粗体 - 任何地方可以看到联合的完整类型的声明" - C11 中存在但在 2003、2011 或 2014 年的 C++ 草案中没有的条款(几乎完全相同,但后来的版本用新术语标准布局替换了POD").在任何情况下,任何 C++ 标准的相应部分都完全没有 union 类型位的可见声明.

But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in the corresponding section of any C++ standard.

@loop 和 @Mints97,在这里 - https://stackoverflow.com/a/28528989/2757035 - 显示这条线在 C89 中也不存在,首先出现在 C99 中,从那以后一直保留在 C 中(尽管,再一次,从未过滤到 C++).

@loop and @Mints97, here - https://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).

[剪断 - 看我的答案]

[snipped - see my answer]

那么,我的问题是:

  • 这是什么意思? 什么是可见声明"?该条款是否旨在缩小或扩大此类双关语"定义行为的上下文范围?

  • What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?

我们是否认为 C++ 中的这种省略是故意的?

C++ 与 C 不同的原因是什么? C++ 是否只是从 C89 中继承"了这一点,然后决定——或者更糟的是,忘记——与 C99 一起更新吗?

What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?

如果差异是故意的,那么C 与 C++ 的两种不同处理方式有什么优点或缺点?

If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?

它在编译或运行时有什么有趣的后果(如果有的话)?例如,@ecatmur,在回复我在他的原始答案中指出这一点的评论中(链接如上),推测如下.

What, if any, interesting ramifications does it have at compile- or runtime? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.

我想它允许更积极的优化;C 可以假设函数参数 S* sT* t 不别名,即使它们共享一个公共初始序列只要没有 union { S;;} 在视图中,而 C++ 只能在链接时做出这个假设.可能值得关于这种差异提出一个单独的问题.

I'd imagine it permits more aggressive optimization; C can assume that function arguments S* s and T* t do not alias even if they share a common initial sequence as long as no union { S; T; } is in view, while C++ can make that assumption only at link time. Might be worth asking a separate question about that difference.

好吧,我在这里,问!我对有关此问题的任何想法都非常感兴趣,尤其是:(任一)标准的其他相关部分、委员会成员或其他受人尊敬的评论员的引述、可能已经注意到实际差异的开发人员的见解 - 假设任何编译器甚至麻烦强制执行 C 的附加条款 - 等等.目的是生成有关此 C 条款及其(有意或无意)从 C++ 遗漏的相关事实的有用目录.那么,我们走吧!

Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!

推荐答案

我已经通过迷宫找到了一些关于此的重要资源,并且我认为我已经对其进行了相当全面的总结.我将此作为答案发布,因为它似乎解释了 C 子句的(IMO 非常误导)意图以及 C++ 没有继承它的事实.如果我发现进一步的支持材料或情况发生变化,这将随着时间的推移而发展.

I've found my way through the labyrinth to some great sources on this, and I think I've got a pretty comprehensive summary of it. I'm posting this as an answer because it seems to explain both the (IMO very misguided) intention of the C clause and the fact that C++ does not inherit it. This will evolve over time if I discover further supporting material or the situation changes.

这是我第一次尝试总结一个非常复杂的情况,即使对于许多语言架构师来说,这似乎也没有明确定义,所以我欢迎关于如何改进这个答案的澄清/建议 - 或者只是一个更好的答案,如果有人的话有一个.

This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects, so I'll welcome clarifications/suggestions on how to improve this answer - or simply a better answer if anyone has one.

通过模糊相关的线程,我找到了@tab 的以下答案 - 非常感谢所包含的(如果不是结论性的)GCC 和工作组缺陷报告的链接:通过 StackOverflow 上的标签回答

Through vaguely related threads, I found the following answer by @tab - and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: answer by tab on StackOverflow

GCC 链接包含一些有趣的讨论,并揭示了部分委员会和编译器供应商的大量混淆和相互矛盾的解释 - 围绕 union 成员 struct 的主题C 和 C++ 中的 s、双关语和别名.

The GCC link contains some interesting discussion and reveals a sizeable amount of confusion and conflicting interpretations on part of the Committee and compiler vendors - surrounding the subject of union member structs, punning, and aliasing in both C and C++.

最后,我们链接到主要事件 - 另一个 BugZilla 线程,Bug 65892,包含一个非常有用的讨论.特别是,我们找到了两个关键文档中的第一个的方法:

At the end of that, we're linked to the main event - another BugZilla thread, Bug 65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:

C 提案 N685 是关于 union 类型声明可见性的附加条款的起源.通过某些声称(参见 GCC 线程 #2)是对公共初始序列"允许的完全误解,N685 确实旨在允许放宽公共初始序列"的别名规则struct TU 中的一些union 包含所述struct 类型 的实例,正如我们从这句话中看到的:

C proposal N685 is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types, as we can see from this quote:

建议的解决方案是要求联合声明可见如果通过一个共同的初始序列(如上)的别名是可能的.因此,如果需要,以下 TU 提供了这种别名:

The proposed solution is to require that a union declaration be visible if aliases through a common initial sequence (like the above) are possible. Therefore the following TU provides this kind of aliasing if desired:

union utag {
    struct tag1 { int m1; double d2; } st1;
    struct tag2 { int m1; char c2; } st2;
};

int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
     pst2->m1 = 2;
     pst3->m1 = 0;   /* might be an alias for pst2->m1 */
     return pst2->m1;
}

从 GCC 的讨论和下面的评论(例如@ecatmur 的)来看,这个提案 - 似乎强制要求推测性地允许对在某些 unionstruct 类型使用别名> 此 TU 可见 - 似乎受到了极大的嘲笑并且很少实施.

Judging by the GCC discussion and comments below such as @ecatmur's, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented.

很明显,在不完全削弱许多优化的情况下满足对添加条款的这种解释是多么困难 - 收益甚微,因为很少有编码人员想要这种保证,而那些这样做的人只需打开 fno-strict-aliasing(IMO 指出更大的问题).如果实施,这种津贴更有可能让人们发现并与 union 的其他声明进行虚假交互,而不是有用.

It's obvious how difficult it would be to satisfy this interpretation of the added clause without totally crippling many optimisations - for little benefit, as few coders would want this guarantee, and those who do can just turn on fno-strict-aliasing (which IMO indicates larger problems). If implemented, this allowance is more likely to catch people out and spuriously interact with other declarations of unions, than to be useful.

继此以及我在其他地方发表的评论之后,@Potatoswatter 在 SO 上的这个答案中 指出:

Following on from this and a comment I made elsewhere, @Potatoswatter in this answer here on SO states that:

C++ 中故意省略了可见性部分,因为它被广泛认为是荒谬和无法实现的.

The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.

换句话说,看起来 C++ 故意避免采用这个附加条款,可能是因为它被广泛认为是荒谬的. 在要求对此进行记录在案"引用时,Potatoswatter 提供了以下内容有关线程参与者的关键信息:

In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:

那次讨论中的人基本上都在那里记录在案".Andrew Pinski 是一个铁杆 GCC 后端人员.Martin Sebor 是活跃的 C 委员会成员.Jonathan Wakely 是活跃的 C++ 委员会成员和语言/库实现者.那个页面比我能写的任何东西都更权威、更清晰、更完整.

The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.

Potatoswatter 在上面链接的同一个 SO 线程中得出结论,C++ 故意排除了这一行,没有对指向公共初始序列的指针进行特殊处理(或者,充其量是实现定义的处理).与任何其他指标相比,未来是否会明确定义他们的治疗方法,还有待观察;与我下面关于 C 的最后一部分相比.不过,目前还不是(再次,IMO,这很好).

Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; compare to my final section below about C. At present, though, it is not (and again, IMO, this is good).

因此,随着 N685 的恶意行......cast 搁置"......我们又回到假设指向公共初始序列的指针在别名方面并不特殊.仍然.值得确认一下 C++ 中的这一段没有它意味着什么.好吧,上面的第二个 GCC 线程链接到另一个 gem:

So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:

C++ 缺陷 1719.此提案已达到 DRWP 状态:一个 DR 问题,其解决方案反映在当前工作文件中.工作文件是标准未来版本的草案"- 引用.这要么是在 C++14 之后,要么是至少在我这里的最终草案 (N3797) 之后 - 并提出了重要的,在我看来具有启发性的,重写本段的措辞,如下所示.我将我认为的重要更改加粗,{这些评论} 是我的:

C++ defect 1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:

在标准布局中具有活动成员的联合 {"active" 表示 union 实例,而不仅仅是类型} (9.5 [class.union])结构类型T1,允许读取{以前检查"}一个非静态数据成员m另一个联合成员的结构类型 T2 提供 mT1T2 的共同初始序列.[注意:读取易失性对象通过非易失性泛左值具有未定义的行为 (7.1.6.1[dcl.type.cv]).——结尾说明]

In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union]) of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1 [dcl.type.cv]). —end note]

这似乎澄清了旧措辞的含义:对我来说,它表示 union 成员 struct 之间任何特别允许的 'punning' 与 common初始序列必须通过父union的实例完成 - 而不是基于structs的类型(例如指向它们的指针传递给某个函数).这个措辞似乎排除了任何其他解释,a la N685.我会说,C 最好采用这种方法.嘿嘿,说起来,往下看!

This seems to clarify the meaning of the old wording: to me, it says that any specifically allowed 'punning' among union member structs with common initial sequences must be done via an instance of the parent union - rather than being based on the type of the structs (e.g. pointers to them passed to some function). This wording seems to rule out any other interpretation, a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!

结果是 - 正如@ecatmur 和 GCC 票证中很好地证明的那样 - 根据 C++ 中的定义,这会留下 此类 union 成员 struct ,并且实际上在 C 中,遵循与任何其他 2 个正式无关的指针相同的严格别名规则. 明确保证能够读取非活动 union 成员 struct 的公共初始序列s 现在被更清晰地定义,不包括 N685 为 C 所尝试的模糊和难以想象的单调乏味的可见性".根据这个定义,主要编译器的行为符合预期对于 C++.至于C?

The upshot is that - as nicely demonstrated by @ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?

同样值得注意的是,C 委员会成员 Martin Sebor 也希望用这种精美的语言来解决这个问题:

It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:

Martin Sebor 2015-04-27 14:57:16 UTC 如果你们中的一个人能解释它的问题,我愿意写一篇论文并提交给 WG14 并请求标准改变了.

Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.

Martin Sebor 2015-05-13 16:02:41 UTC 上周我有机会与 Clark Nelson 讨论这个问题.克拉克过去曾致力于改进 C 规范的别名部分,例如在 N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm).他同意,就像 N1520 中指出的问题一样,这也是一个值得 WG14 重新审视和修复的突出问题."

Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."

Potatoswatter 得出结论:

Potatoswatter inspiringly concludes:

C 和 C++ 委员会(通过 Martin 和 Clark)将尝试达成共识并敲定措辞,以便标准最终可以说明其含义.

The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.

我们只能希望!

同样,欢迎所有进一步的想法.

Again, all further thoughts are welcome.

这篇关于带有“通用初始序列"的联合“双关语"结构:为什么 C(99+)而不是 C++ 规定了“联合类型的可见声明"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆