C11中的并集和严格别名 [英] Unions and strict aliasing in C11

查看:153
本文介绍了C11中的并集和严格别名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个这样的工会

Assuming I have a union like this

union buffer {
  struct { T* data; int count; int capacity; };
  struct { void* data; int count; int  capacity; } __type_erased;
};

如果在C11别名规则下混合对匿名struct成员和__type_erased成员的读/写操作,我会遇到麻烦吗?

Will I get into trouble if I mix reads/writes to the anonymous struct members and __type_erased members under C11 aliasing rules?

更具体地说,我对如果独立访问组件(例如通过不同的指针)时发生的行为感兴趣.为了说明:

More specifically, I am interested in the behaviour that occurs if the components are accessed independently (e.g. via different pointers). To illustrate:

grow_buffer(&buffer.__type_erased);
buffer.data[buffer.count] = ...

我已经阅读了所有可能找到的相关问题,但是我对此仍然不是100%清楚,因为有些人似乎暗示这种行为是不确定的,而另一些人则认为这是合法的.此外,我发现的信息是C ++,C99,C11等规则的混合,很难消化.在这里,我对C11强制执行的行为以及流行的编译器(Clang,GCC)展示的行为很感兴趣

I have read all the relevant questions I could find, but I am still not 100% clear on this as some people seem to suggest that such behaviour is undefined while others say that it is legal. Furthermore, the information I find is a mix of C++, C99, C11 etc. rules that is quite difficult to digest. Here, I am interested explicitly in the behaviour mandated by C11 and exhibited by popular compilers (Clang, GCC)

我现在已经使用多个编译器进行了一些实验,并决定分享我的发现,以防有人遇到类似问题.我提出问题的背景是,我试图在普通C语言中编写用户友好的高性能通用动态数组实现.该想法是使用宏执行数组操作,而重型操作(如增长数组)是使用别名类型擦除的模板结构执行.例如,我可以拥有这样的宏:

I have now performed some experiments with multiple compilers and decided to share my findings in case someone runs into a similar issue. The background of my question is that I was trying to write a user-friendly high-performance generic dynamic array implementation in plain C. The idea is that array operation is carried out using macros and heavy-duty operations (like growing the array) are performed using an aliased type-erased template struct. E.g., I can have macro like this:

#define ALLOC_ONE(A)\
    (_array_ensure_size(&A.__type_erased, A.count+1), A.count++)

,该数组在必要时增大数组,并返回新分配的项的索引.规范(6.5.2.3)规定允许通过不同的工会成员访问同一位置.我对此的解释是,尽管_array_ensure_size()不了解联合类型,但编译器应注意,成员__type_erased可能因副作用而发生突变.也就是说,我认为这应该可行.但是,这似乎是一个灰色区域(老实说,规范实际上并不清楚什么构成成员访问权限).苹果最新的Clang(clang-800.0.33.1)没问题.该代码在没有警告的情况下编译并按预期运行.但是,使用GCC 5.3.0编译时,代码会因段错误而崩溃.实际上,我非常怀疑GCC的行为是一个错误-我尝试通过删除可变指针ref并采用明确的功能样式来使工会成员突变明确,例如:

that grows the array if necessary and returns an index of the newly allocated item. The spec (6.5.2.3) states that access to the same location via different union members are allowed. My interpretation of this is that while _array_ensure_size() is not aware of the union type, the compiler should be aware that the member __type_erased can be potentially mutated by a side effect. That is, I'd assume that this should work. However, it seems that this is a grey zone (and to be honest, the spec is really not clear of what constitutes a member access). Apple's latest Clang (clang-800.0.33.1) has no problems with it. The code compiles without warnings and runs as expected. However, when compiled with GCC 5.3.0 the code crashes with a segfault. In fact, I have a strong suspicion that GCC's behaviour is a bug — I tried making union member mutation explicit by removing the mutable pointer ref and adopting a clear functional style, e.g.:

#define ALLOC_ONE(A) \
   (A.__type_erased = _array_ensure_size(A.__type_erased, A.count+1),\
    A.count++)

这再次与Clang一样,如预期的那样,但再次使GCC崩溃.我的结论是,带有联合的高级类型操作是一个灰色区域,应谨慎使用.

This again works with Clang, as expected, but crashes GCC again. My conclusion is that advanced type manipulation with unions is a grey area where one should tread carefully.

推荐答案

C11标准规定:

6.5.2.3结构和工会成员

6.5.2.3 Structure and union members

95)如果用于读取联合对象的内容的成员不是 与上一次用于在对象中存储值的成员相同, 对象表示的适当部分的值是 重新描述为新类型的对象表示形式,如所述 在6.2.6中(有时称为类型校正"的过程).这可能是 陷阱表示.

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

因此,从C11中的联合字段读/写角度来看,它是正确的.但是严格混叠是基于类型的分析,因此它的幼稚实现可以说这些读/写操作是独立的.据我了解,现代gcc可以检测带有并集字段的案例并避免此类错误.

So from the point of view of union field read/write in C11 it is correct. But strict-aliasing is type-based analysis, so its naive implementation can say these read/write operations to be independent. As I understand modern gcc can can detect cases with union fields and avoid such errors.

阿洛索,您应该记住,在某些情况下,指向工会成员的指针是无效的:

Aloso you should remember that there are some cases with pointers to union members that are invalid:

以下不是有效的片段(因为联合类型不是 在函数f)中可见:

The following is not a valid fragment (because the union type is not visible within function f):

struct t1 { int m; };
struct t2 { int m; };
int f(struct t1 *p1, struct t2 *p2)
{
  if (p1->m < 0)
  p2->m = -p2->m;
  return p1->m;
}
int g()
{
  union {
    struct t1 s1;
    struct t2 s2;
  } u;
  /* ... */
  return f(&u.s1, &u.s2);
}

我认为使用工会在不同成员中进行读/写是危险的,最好避免这样做.

In my opinion using unions for reading/writing in different members is dangerous and it is better to aviod it.

这篇关于C11中的并集和严格别名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆