严格的别名规则指定不正确吗? [英] Is the strict aliasing rule incorrectly specified?

查看:88
本文介绍了严格的别名规则指定不正确吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

先前建立的形式的联合

union some_union {
    type_a member_a;
    type_b member_b;
    ...
};

具有 n 个成员的

包括 n + 1个重叠存储的对象:一个用于联合本身的对象,一个用于每个联合成员的对象.显然,即使读取的工会成员不是最后一个写入的工会成员,您也可以按任意顺序自由地对其进行读写.绝对不会违反严格的别名规则,因为访问存储的左值具有正确的有效类型.

with n members comprises n + 1 objects in overlapping storage: One object for the union itself and one object for each union member. It is clear, that you may freely read and write to any union member in any order, even if reading a union member that was not the last one written to. The strict aliasing rule is never violated, as the lvalue through which you access the storage has the correct effective type.

这是脚注95 进一步支持的 ,其中解释了类型调整是联合的预期用途.

This is further supported by footnote 95, which explains how type punning is an intended use of unions.

此函数由 strict别名规则启用的优化的典型示例是

A typical example of the optimizations enabled by the strict aliasing rule is this function:

int strict_aliasing_example(int *i, float *f)
{
    *i = 1;
    *f = 1.0;
    return (*i);
}

编译器可以将其优化为类似的内容

which the compiler may optimize to something like

int strict_aliasing_example(int *i, float *f)
{
    *i = 1;
    *f = 1.0;
    return (1);
}

因为可以安全地假定对*f的写操作不会影响*i的值.

because it can safely assume that the write to *f does not affect the value of *i.

但是,当我们将两个指针传递给同一联合的成员时,会发生什么?考虑这个示例,假设一个典型的平台,其中float是IEEE 754单精度浮点数,而int是32位二进制补码整数:

However, what happens when we pass two pointers to members of the same union? Consider this example, assuming a typical platform where float is an IEEE 754 single precision floating point number and int is a 32 bit two's complement integer:

int breaking_example(void)
{
    union {
        int i;
        float f;
    } fi;

    return (strict_aliasing_example(&fi.i, &fi.f));
}

如先前所建立的,fi.ifi.f是指重叠的存储区域.以任何顺序读取和写入它们是无条件合法的(只有在联合已初始化后才可以写入).我认为,前面讨论的所有主要编译器执行的优化都会产生错误的代码,因为不同类型的两个指针合法指向同一位置.

As previously established, fi.i and fi.f refer to an overlapping memory region. Reading and writing them is unconditionally legal (writing is only legal once the union has been initialized) in any order. In my opinion, the previously discussed optimization performed by all major compilers yields incorrect code as the two pointers of different type legally point to the same location.

我不敢相信我对严格的别名规则的解释是正确的.鉴于上述特殊情况,严格的混叠设计所无法实现的最优化似乎是不合理的.

I somehow can't believe that my interpretation of the strict aliasing rule is correct. It doesn't seem plausible that the very optimization the strict aliasing was designed for is not possible due to the aforementioned corner case.

请告诉我为什么我错了.

Please tell me why I'm wrong.

研究期间出现了一个相关问题.

在添加您自己的答案之前,请先阅读所有现有答案及其注释,以确保您的答案添加了新的论点.

推荐答案

从您的示例开始:

int strict_aliasing_example(int *i, float *f)
{
    *i = 1;
    *f = 1.0;
    return (*i);
}

首先让我们承认,在没有任何并集的情况下,如果if都指向同一个对象,则这将违反严格的别名规则;假设对象没有有效类型,则*i = 1将有效类型设置为int*f = 1.0然后将其设置为float,最后的return (*i)然后通过访问有效类型为float的对象int类型的左值,显然是不允许的.

Let's first acknowledge that, in the absence of any unions, this would violate the strict aliasing rule if i and f both point to the same object; assuming the object has no effective type, then *i = 1 sets the effective type to int and *f = 1.0 then sets it to float, and the final return (*i) then accesses an object with effective type of float via an lvalue of type int, which is clearly not allowed.

问题是,如果if都指向同一联盟的成员,这是否仍将构成严格混叠违规.在工会成员上,通过."进行访问.成员访问运算符,规范指出(6.5.2.3):

The question is about whether this would still amount to a strict-aliasing violation if both i and f point to members of the same union. On union member access via the "." member access operator, the specification says (6.5.2.3):

后缀表达式,后跟.运算符和标识符 指定结构或联合对象的成员.价值在于 命名成员(95)的值,如果第一个表达式为,则为左值 一个左值.

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member (95) and is an lvalue if the first expression is an lvalue.

上面提到的脚注95说:

The footnote 95 referred to in above says:

如果用于读取联合对象内容的成员不是 与上次用于在对象中存储值的成员相同, 对象表示的适当部分的值是 重新描述为新类型的对象表示形式,如所述 在6.2.6中(有时称为类型校正"的过程).这可能是 陷阱表示.

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

这显然是为了允许通过联合进行类型修饰,但应注意的是(1)脚注是非规范性的,也就是说,它们不应禁止行为,而应阐明某些行为的意图. (2)编译器供应商认为通过联合使用此类型联名的这种限制仅将应用于通过联合成员访问操作符进行访问-因为否则,严格的别名是毫无意义的,因为几乎所有可能的别名访问也可能是同一联合的成员.

This is clearly intended to allow type punning via a union, but it should be noted that (1) footnotes are non-normative, that is, they are not supposed to proscribe behaviour, but rather they should clarify the intention of some part of the text in accordance with the rest of the specification, and (2) this allowance for type punning via a union is deemed by compiler vendors as applying only for access via the union member access operator - since otherwise strict aliasing is pretty meaningless, since just about any potentially aliasing accesses could also be potentially members of the same union.

您的示例通过指向不存在的联合成员或至少非活动联合成员的指针进行存储,从而实施了严格的别名冲突(因为它使用左值访问了 处于活动状态的成员类型不当)或使用不表示对象的左值(因为不存在与非活动成员相对应的对象)-可以用两种方式争论,标准也不明确,但是两种解释都意味着您的示例具有未定义的行为.

Your example stores via a pointer to a non-existing or at least non-active union member, and thereby either commits a strict aliasing violation (since it accesses the member that is active using an lvalue of unsuitable type) or uses an lvalue which does not denote an object (since the object corresponding to the non-active member doesn't exist) - it could be argued either way and the standard is not particularly clear, but either interpretation means that your example has undefined behaviour.

(我可能要补充一点,我看不到脚注允许通过联合进行类型修饰的方式如何描述规范中固有的行为,也就是说,这似乎违反了ISO禁止行为的规定;该规范似乎允许通过联合进行类型修剪.此外,阅读规范文本有点费力,因为要求这种形式的类型转换要求必须立即通过联合类型进行访问.

(I might add that I can not see how the footnote allowing type-punning via a union describes behavior that is otherwise inherent in the specification - that is, it seems to break the ISO rule of not proscribing behaviour; nothing else in the specification seems to make any allowance for type punning via a union. Furthermore it is something of a stretch to read the normative text as requiring that this form of type punning requires that access must be done immediately via the union type).

规范的另一部分经常引起混乱,但是,在6.5.2.3中也是如此:

There is often confusion caused by another part of the specification, however, also in 6.5.2.3:

为了简化联合的使用,做出了一项特殊保证: 如果一个联合包含几个共享共同的初始字母的结构 序列(请参见下文),以及并集对象当前是否包含一个 在这些结构中,可以检查共同的姓名缩写 其中任何一部分的完成类型声明的一部分 工会的形象可见.

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

尽管由于没有通用的初始序列,因此这不适用于您的示例,但我已经看到人们将其视为控制类型校正的一般规则(至少在涉及通用的初始序列时);他们认为,这意味着应该有可能使用这样的类型修剪,即只要可见完整的联合声明,便可以使用指向不同联合成员的两个指针 (因为在上面引用的段落中出现了这种含义的词).但是,我要指出的是,以上段落仍仅适用于通过."进行的工会成员访问.操作员.在这种情况下,协调这种理解的问题是,无论如何,完整的工会声明 都是可见的,因为否则您将无法引用工会成员.我认为这是措辞上的小故障,再加上示例3中的类似错误措辞(以下不是有效的片段(因为并集类型不可见...),当并集可见性时并不是真正的决定性因素),这使一些人相信common-initial-sequence异常旨在全局应用,而不仅仅是通过."进行成员访问.运算符,作为严格别名规则的例外;并且得出了这个结论之后,读者可能会解释有关类型修剪的脚注也将在全球范围内应用,有些人会这样做:请参阅

Although this does not apply to your example since there is no common initial sequence, I've seen people read this as being a general rule for governing type punning (at least when a common initial sequence is involved); they believe that it implies that it should be possible to use such type punning using two pointers to different union members whenever the complete union declaration is visible (since words to that effect appear in the paragraph quoted above). However, I would point out that the paragraph above still only applies to union member access via the "." operator. The problem with reconciling this understanding is, in that case, that the complete union declaration must anyway be visible, since otherwise you would not be able to refer to the union members. I think that it is this glitch in the wording, combined with similarly bad wording in Example 3 (The following is not a valid fragment (because the union type is not visible ...), when union visibility is not really the deciding factor), that makes some people construe that the common-initial-sequence exception is intended to apply globally, not just for member access via the "." operator, as an exception to the strict aliasing rule; and, having come to this conclusion, a reader might then interpret the footnote regarding type punning to apply globally also, and some do: see the discussion on this GCC bug for example (note that the bug has been in SUSPENDED state for a long time).

(顺便说一句,我知道一些编译器没有实现全局通用初始序列"规则.我没有具体意识到任何编译器都实现了全局通用初始序列"规则虽然也不允许任意类型的修剪,但这并不意味着不存在此类编译器.委员会对

(Incidentally, I am aware of several compilers that do not implement the "global common initial sequence" rule. I am not specifically aware of any compilers which implement the "global common initial sequence" rule while not also allowing arbitrary type punning, but that doesn't mean such compilers don't exist. The committee response to Defect Report 257 suggests that they intend the rule to be global, however, I personally think the idea that the mere visibility of a type should change the semantics of code which doesn't refer to that type is deeply flawed, and I know others agree).

在这一点上,您很可能会质疑如何通过member-access运算符读取非活动的Union成员,而不违反严格的别名,如果通过指针这样做会违反严格的别名.这再次是规范有些模糊的领域.关键可能在于确定哪个左值负责访问.例如,如果联合对象u具有成员a,并且我通过表达式u.a进行了读取,则可以将其解释为成员对象(a)的访问,或者仅解释为访问然后从中提取成员值的并集对象(u)的值.在后一种情况下,没有混叠冲突,因为明确允许它通过包含合适成员(6.5¶7)的聚合类型的左值访问对象(即,活动成员对象).实际上,6.5.2.3中的成员访问运算符的定义确实支持这种解释,即使有些弱:该值是命名成员的值-尽管它可能是左值,但没有必要访问该左值所引用的对象以获得成员的值,因此避免了严格的别名冲突.但这又有点延伸了.

At this point you could well question how reading a non-active union member via the member-access operator doesn't violate strict aliasing, if doing the same via a pointer does so. This is again an area where the specification is somewhat hazy; the key is perhaps in deciding which lvalue is responsible for the access. For instance, if a union object u has a member a and I read it via the expression u.a, then we could interpret this as either an access of the member object (a) or as merely an access of the union object (u) which the member value is then extracted from. In the latter case, there is no aliasing violation since it is specifically allowed to access an object (i.e. the active member object) via an lvalue of aggregate type containing a suitable member (6.5¶7). Indeed, the definition of the member access operator in 6.5.2.3 does support this interpretation, if somewhat weakly: the value is that of the named member - while it is potentially an lvalue, it is not necessary to access the object referred to by that lvalue in order to obtain the value of the member, and so strict aliasing violation is avoided. But this is again stretching a little.

(在我看来,通常来说,只是当对象按照6.5¶7通过左值表达式访问它的存储值..."时,我们当然可以为自己做出一个合理的决定,但是那么我们必须小心允许按上述方式通过并集进行类型处理,否则我们将不理会脚注95.尽管经常会出现不必要的措辞,但有时仍缺少必要的详细说明.

(To me it seems under-specified, generally, just when an object has "its stored value accessed ... by an lvalue expression" as per 6.5¶7; we can of course make a reasonable determination for ourselves, but then we must be careful to allow for type-punning via unions as per above, or otherwise be willing to disregard footnote 95. Despite the often unnecessary verbiage, the specification is sometimes lacking in necessary detail).

关于联合语义的争论总是参考 DR 236 .实际上,您的示例代码表面上与该缺陷报告中的代码非常相似.我会指出:

Arguments about union semantics invariably refer to DR 236 at some point. Indeed, your example code is superficially very similar to the code in that Defect Report. I would note that:

  1. 委员会认为示例2违反了6.5第7段中的别名规则"-这与我上面的推理并不矛盾;
  2. 为了不违反规则,示例中的函数f应该写为"-这支持我上面的推理;您必须使用并集对象(和."运算符)来更改活动成员类型,否则,您将访问不存在的成员(因为一次只能包含一个成员);
  3. DR 236中的示例与类型操纵有关.这是关于是否可以通过指向非活动联盟成员的指针来分配该成员的确定.所讨论的代码与此处所讨论的代码略有不同,因为它在写入第二个成员之后不再尝试再次访问原始"联合成员.因此,尽管示例代码在结构上相似,但缺陷报告在很大程度上与您的问题无关.
  4. 委员会在DR 236中的答复声称两个程序都调用未定义的行为".但是,讨论不支持此操作,该讨论仅显示示例2调用了未定义的行为.我相信回应是错误的.

这篇关于严格的别名规则指定不正确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆