工会“双关语”结构瓦特/"公共初始序列":为什么C(99+),而不是C ++,规定了“联合类型的可见声明”? [英] union 'punning' structs w/ "common initial sequence": Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?

查看:164
本文介绍了工会“双关语”结构瓦特/"公共初始序列":为什么C(99+),而不是C ++,规定了“联合类型的可见声明”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

通过联盟通常通过@ecatmur引述以下位,在这里(的http://stackoverflow.com/a/31557852/2757035 ),对标准布局的豁免结构取值有成员类型的公共初始序列


  

C11( 6.5.2.3结构和联合成员; 语义


  
  

    

[...]如果一个工会包含共享一个通用初始序列几种结构(见下文),如果联合对象当前
    包含这些结构之一​​,它允许检查
    其中的任何的通用初始一部分的声明在任何地方
    工会的完成型可见
。两种结构共享
    的公共初始序列的如果对应成员有兼容的类型(以及对位字段,同样的宽度)一个序列或
    多个初始成员。


  
  
  

C ++ 03( [class.mem] / 16


  
  

    

如果一个POD联合包含两个或多个POD-结构具有共同的初始序列,并且如果POD联合对象目前包含一个
    这些POD-结构的,它允许检查公共初始
    其中的任何部分。两POD-结构有着共同的初始序列
    如果对应的成员具有布局兼容的类型(并且,对于
    位字段,相同的宽度)为一个序列的一个或多个初始
    成员。


  
  
  

的两个标准的其它版本具有类似的语言因为C ++ 11
  使用的术语的标准布局的而不是 POD


请注意大胆 - 任何地方完成型工会的声明是可见 - 一个在C11,但无处在C ++中的草稿存在,2003年,2011年,年或2014年(第所有的几乎相同,但后来的版本替换POD新学期的标准布局的)。在任何情况下,可见声明联盟键入位的在任何C ++标准完全不存在我已经看到了这一点。

@loop和@ Mints97,在这里 - http://stackoverflow.com/a/28528989/2757035 - 秀此行也是不存在于C89,C99中首先出现并留在C,因为C则(虽然再次,从来没有经过过滤,C ++)。

解决此

标准讨论

[剪断 - 见我anwswer]

问题

这是这一点,那么,我的问题是:


  • 这是什么意思?什么是归类为看得见的声明?增加了对走样的原因这一条款,的例如的,想一语双关任何函数的两个成员结构 A B 必须通过工会U 包含它们,而不是'裸'类型的两个 - 使得它知道它们可以别名为彼此? (击败在工会有他们的点,我的眼睛。) 据我所知,这似乎是最有可能的间pretation,但C标准是模糊的不幸,所以我不能肯定。我认为,最好的,一个或两个标准需要它们的措辞严肃的提高。 [/编辑]


  • 难道我们的假设,这一疏忽在C ++中是非常谨慎的?如果这样,谢天谢地的 [擦拭额头,并回到我的项目,可能最终使用这个成语]


  • 什么是C ++的原因让这个当C不?难道C ++只是继承这从C89,然后要么决定 - 或者更糟,的忘记的 - C99一起更新


  • 如果这是故意的,则比其他明显的(保证这样的双关语是允许的,而不是沦为实现定义),因缺乏这一要求出现什么好处?


  • 什么,如果有的话,有趣的后果它有​​编译(包括优化,链接等),或造成程序的任何其他方面?例如,@ecatmur,在评论回复我指出这对他原来的答案(如上面的链接),推测如下:



  

我想像它允许更积极的优化; C可假定
  函数参数取值* S T * T 不别名即使它们共享一个
  公共初始序列只要没有工会{S;笔; } 在视图中,
  而C ++只能在连接时作出这样的假设。可能是值得的
  询问这种差别一个单独的问题。


好了,我在这里!我在这一点,尤其是任何想法非常感兴趣:将(或)标准的其他有关部门,从开发商谁可能已经注意到一个现实的差异,由于该委员会成员或其他尊敬的评论家,见解报价 - 假设的编译器甚至的什么麻烦事的实施C'S加限制 - 等

[即使这只是一个偶然的疏忽,我需要知道,尽快,这样我就可以考虑重构我的code。说实话,我可能不会并且将自己限定在 G ++ - 这使得这种通过回落至按字节(对象)reinter pretation它继承了其对于C++\"> GCC IMPL高清 => G ++ IMPL画质)...但我已经大大preFER保持完全可移植的,如果我能。的]

但是,不管怎样,我们不要让这个关于我的 - 什么事情从SO的角度来看,最多的是发电量约该C条款和它的(有意或无意)从C ++遗漏相关事实的有用目录。那么,我们走吧!


解决方案

我发现我的方式,通过迷宫就这一一些伟大的来源,我认为 - 得益于更为锲而不舍的人的努力 - 我最终找到了它一个适当的总结。我张贴这是一个答案,因为它似乎可以解释的C条款均意图和C ++的遗漏体。这将随着时间的推移,如果我探索进一步辅助材料吧。

当然,我会欢迎就如何改善这个答案澄清/建议。或者,如果任何人有一个更好的,我会接受的。如果我PTED任何这间错误$ P $,告诉我!这是我第一次尝试总结出一个非常复杂的局面,这似乎不明确甚至许多语言建筑师。

最后,一些具体的评论

总之,通过隐约相关的主题,我发现@选项卡的以下的答案和很多AP preciated所包含的链接(照明,如果没有定论)海湾合作委员会和工作组的缺陷报告:http://stackoverflow.com/a/19807355

GCC的链接中包含了一些有趣的讨论,揭示委员会和供应商混乱的一个相当大的量/冲突的跨pretations - 围绕联盟成员结构 S,双关语和走样 - 跨越C和C ++

在那年底,我们再联系到的重头戏 - 另一个线程的BugZilla,的 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892 ,包含的极其有益的探讨。特别是,我们发现我们的路前两个关键文件:

C99中添加的行原产地

C议案N685 的http:// www.open-std.org/jtc1/sc22/wg14/www/docs/n685.htm 这是一个关于联盟<知名度增加的条款的来源/ code>类型声明。通过什么某一要求(见GCC线#2)是公共初始序列补贴总misinter pretation,N685的确意在让走样规则放宽为公共初始序列结构个TU内了解一些包含上述的实例工会结构类型


  

提出的解决方案是要求联合声明可见
  如果通过一个共同的初始序列(如以上所述)的别名是可能的。
  因此下面的TU如果需要提供这种混淆的:


 工会UTAG {
    结构TAG1 {INT M1;双D2; } ST1;
    结构TAG2 {INT M1;烧焦C2; } ST2;
};INT similar_func(结构TAG1 * PST2,结构TAG2 * pst3){
     pst2-&GT; M1 = 2;
     pst3-&GT; M1 = 0; / *可能是pst2-&GT的别名; M1 * /
     返回pst2-&GT; M1;
}

有一些结构锯齿型 -

由海湾合作委员会的讨论和下面的评论,这个提案来看>工会这个TU可见 - 似乎已经获得了巨大的嘲笑和很少得到落实 - 就证明了ecatmur。很明显这是多么困难的事情,而不削弱不少的优化 - 对于没有什么好处少codeRS希望本次担保(如果我这样做,我只是打开 FNO严格走样)。它更可能只是抓住人们并不合逻辑与联盟秒。其他声明相互作用

从C线的省略++

根据这一点,并评论我在其他地方做,@Potatoswatter - http://stackoverflow.com/a/19805106 - 指出:


  

能见度的部分是特意从C ++省略,因为它被广泛认为是可笑和unimplementable。


在换句话说,它看起来像C ++刻意回避采用这种补充条款,可能是由于其广泛pereceived荒谬。在寻求这样的一个记录在案引文,Potatoswatter提供了以下有关线程的参与者主要的信息:


  

在讨论的人基本上都是记录在案在那里。安德鲁·平斯基是一个铁杆GCC后端的家伙。马丁Sebor是一种主动式的C委员。乔纳森Wakely是一个积极的C ++委员会成员和语言/库的实现。该网页是比任何我能写出更权威,清晰,完整。


Potatoswatter,在相同的SO跟帖上面链接,得出结论认为,C ++故意排除了此线,留下三分球无特殊治疗(或者,最好的,实现定义的处理)到公共初始序列。无论他们的待遇将在未来的具体定义,与任何其他指针,仍有待观察;请参考我关于C.最后一节在present,事实并非如此。

这是什么意思为C ++? (并且,在实际应用中,C实现)

所以,从N685的邪恶行......的的一边'......我们又回到了假设指针进入公共初始序列都没有特殊的走样方面 - 或最好的实现定义。仍然。它的价值确认一下这个段落C ++意味着没有它。那么,第二个线程GCC上面链接到另一个宝石:

C ++缺陷1719:的http:/ /www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1719 。这一提议已经达到了 DRWP 状态:一个DR问题,其分辨率反映在当前的工作文件的工作文件是标准的未来版本草案 - <一个href=\"https://social.msdn.microsoft.com/Forums/sqlserver/en-US/700569a7-18d3-4ddd-af0b-62bfe5aaf4d1/what-is-the-fastest-way-to-find-a-defect-in-the-c-standard-core-language-defect-report?forum=vcgeneral\"相对=nofollow>引用。这可以是后C ++ 14或终稿我这里(N3797)后至少 - 并提出了显著,在我看来照明,重写这个段落的措辞,如下所示。我加粗什么,我认为是很重要的变化,的 {这些意见} 的是我的:


  

在一个标准布局的联盟与一名积极成员 {主动表示联盟实例,而不是只需要输入} 的(9.5 [class.union])
  结构类型的 T1 ,允许为阅读 {原考察} 的非静态数据成员 M
  结构类型 T2 提供 M 的另外一个联盟成员是部分
  常见的初始序列 T1 T2 。 [注意的:读取一个volatile对象
  通过非挥发性glvalue是未定义行为(7.1.6.1
  [dcl.type.cv])。末端注意事项】


这澄清了previous措辞的含义是:我读了它的话说,任何特别允许的双关语(读不活跃联盟成员)成员结构 s的通用初始序列必须通过的实例来完成的是联盟 - 而非联盟任何模糊概念的键入的。这更清楚的措辞似乎排除任何其他跨pretation的一拉的N685。 ç会很好地采纳这一点,我会说。嘿嘿,说到这,见下文!

其结果是 - 如通过很好地和@ecatmur在海湾合作委员会的机票证明 - 这使得联盟成员结构 S按定义在C ++中,切实在C,受到同样严格别名规则任何其他2正式无关的指针。的能够阅读的不活动的公共初始序列中明确保障联盟成员结构 s是现在比较明确的界定,不包括模糊的,难以想象的乏味到执行知名度为<青霉>试图的由N685为℃。通过这个定义,主要编译已经表现为旨在用于C ++。至于对C?

这条线在C /澄清在C可能逆转++

这也是非常值得注意的是,ç委员会成员马丁Sebor是希望得到这个固定在细的语言,也:


  

马丁Sebor 2015年4月27日14时57分16秒UTC 如果你们中的一个可以解释它,我愿意写一个文件,并提交给WG14,并要求问题有标准的改变。


  
  

马丁Sebor 2015年5月13日16时02分41秒UTC 我有机会在上周讨论与克拉克·纳尔逊这个问题。克拉克制作改善C规范的混叠部分在过去,例如在N1520( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm )。他认为,像问题N15​​20所指出的,这也是一个突出的问题,将是值得WG14重新审视和修正。


Potatoswatter鼓舞地总结道:


  

C和C ++委员会(通过马丁和克拉克)将试图找到一个共识,敲定措辞所以标准终于可以说,这是什么意思。


我们只能希望!

好吧,我有没有做是正确的?它已经相当一段时间,因为我写了一篇文章,我极度疲惫今日(两者至少部分相关)。所有的心思都欢迎!

Background

Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via @ecatmur ( http://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:

C11 (6.5.2.3 Structure and union members; Semantics):

[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

C++03 ([class.mem]/16):

If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

Other versions of the two standards have similar language; since C++11 the terminology used is standard-layout rather than POD.

Please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in any C++ standard I've seen on this.

@loop and @Mints97, here - http://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).

Standards discussions around this

[snipped - see my anwswer]

Questions

From this, then, my questions are [edit: were!]:

  • What does this mean? What is classed as a 'visible declaration'? Was this clause added for reasons of aliasing, e.g., any function wanting to pun two member structs A and B must be passed the union U that contains them, rather than the 'naked' types of both - so that it knows they can be aliased for each other? (Defeating the point of having them in a union, to my eyes.) [edit] As far as I can tell, this seems to be the most likely interpretation, but the C Standard is unfortunately vague, so I can't be sure. I think, at best, one or both standards need their wording seriously improved. [/edit]

  • Are we to assume that this omission in C++ is very deliberate? If so, thank goodness [wipes brow and goes back to my project that might end up using this idiom]

  • What is the reason for C++ allowing this when C does not? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?

  • If this is intentional, then other than the obvious (guaranteeing such punning is allowed and not relegated to implementation-defined), what benefits arise from the lack of this requirement?

  • What, if any, interesting ramifications does it have for compilation (including optimisation, linking, etc.) or any other facet of the resulting program? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows:

I'd imagine it permits more aggressive optimization; C can assume that function arguments S* s and T* t do not alias even if they share a common initial sequence as long as no union { S; T; } is in view, while C++ can make that assumption only at link time. Might be worth asking a separate question about that difference.

Well, here I am! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added restriction - and etc.

[ Even if this is just an accidental omission, I'd need to know that asap, so I can consider refactoring my code. To be honest, I probably won't and will confine myself to g++ - which allows this via falling back to the bytewise (object) reinterpretation it inherits from its C incarnation ( GCC impl-def => g++ impl-def ) ... but I'd vastly prefer to remain fully portable if I can. ]

But, anyway, let's not make this about me - what matters most from SO's perspective is generating a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!

解决方案

I've found my way through the labyrinth to some great sources on this, and I think that - thanks to efforts of far more perseverant people - I've finally got a proper summary of it. I'm posting this as an answer because it seems to explain both the intention of the C clause and C++'s omission thereof. This will evolve over time if I discover further supporting material for it.

Of course, I'll welcome clarifications/suggestions on how to improve this answer. Or if anyone has a better one, I'll accept that. If I've interpreted any of this wrongly, tell me! This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects.

Finally, some concrete commentary

Anyway, through vaguely related threads, I found @tab's following answer and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: http://stackoverflow.com/a/19807355

The GCC link contains some interesting discussion and reveals a sizeable amount of Committee and vendor confusion/conflicting interpretations - around union member structs, punning, and aliasing - spanning C and C++.

At the end of that, we're then linked to the main event - another BugZilla thread, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:

Origin of the added line in C99

C proposal N685 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n685.htm This is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types:

The proposed solution is to require that a union declaration be visible if aliases through a common initial sequence (like the above) are possible. Therefore the following TU provides this kind of aliasing if desired:

union utag {
    struct tag1 { int m1; double d2; } st1;
    struct tag2 { int m1; char c2; } st2;
};

int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
     pst2->m1 = 2;
     pst3->m1 = 0;   /* might be an alias for pst2->m1 */
     return pst2->m1;
}

Judging by the GCC discussion and comments below, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented - as evidenced by ecatmur. It's obvious how difficult this is to do without crippling many optimisations - for little benefit as few coders would want this guarantee (if I did, I'd just turn on fno-strict-aliasing). It's more likely to just catch people out and spuriously interact with other declarations of unions.

Omission of the line from C++

Following on from this and a comment I made elsewhere, @Potatoswatter - http://stackoverflow.com/a/19805106 - states that:

The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.

In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:

The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.

Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; refer to my final section on C. At present, it is not.

What does this mean for C++? (and, in practical terms, C implementations)

So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing - or at best implementation-defined. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:

C++ defect 1719: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:

In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union]) of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1 [dcl.type.cv]). —end note]

This clarifies the meaning of the previous wording: I read it as saying any specifically allowed 'punning' (reading inactive union member) of member structs with common initial sequences must be done via an instance of that union - rather than any vague concept of the union's type. This much clearer wording seems to rule out any other interpretation a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!

The upshot is that - as nicely demonstrated by @ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?

Possible reversal of this line in C / clarification in C++

It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:

Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.

Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."

Potatoswatter inspiringly concludes:

The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.

We can only hope!

Well, did I do it right? It's been quite a while since I wrote an essay, and I am extremely tired today (the two are at least partially related). All thoughts are welcome!

这篇关于工会“双关语”结构瓦特/&QUOT;公共初始序列&QUOT;:为什么C(99+),而不是C ++,规定了“联合类型的可见声明”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆