在 C 中具有严格别名和严格对齐的面向对象模式的最佳实践 [英] Best practices for object oriented patterns with strict aliasing and strict alignment in C

查看:43
本文介绍了在 C 中具有严格别名和严格对齐的面向对象模式的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我多年来一直在编写嵌入式 C 代码,新一代的编译器和优化在警告有问题的代码的能力方面确实变得更好了.

I've been writing embedded C code for many years now, and the newer generations of compilers and optimizations have certainly gotten a lot better with respect to their ability to warn about questionable code.

但是,至少有一个(根据我的经验非常常见)用例会继续引起悲痛,其中多个结构之间共享一个公共基类型.考虑这个人为的例子:

However, there is at least one (very common, in my experience) use-case that continues to cause grief, wheres a common base type is shared between multiple structs. Consider this contrived example:

#include <stdio.h>

struct Base
{
    unsigned short t; /* identifies the actual structure type */
};

struct Derived1
{
    struct Base b; /* identified by t=1 */
    int i;
};

struct Derived2
{
    struct Base b; /* identified by t=2 */
    double d;
};


struct Derived1 s1 = { .b = { .t = 1 }, .i = 42 };
struct Derived2 s2 = { .b = { .t = 2 }, .d = 42.0 };

void print_val(struct Base *bp)
{
    switch(bp->t)
    {
    case 1: 
    {
        struct Derived1 *dp = (struct Derived1 *)bp;
        printf("Derived1 value=%d\n", dp->i);
        break;
    }
    case 2:
    {
        struct Derived2 *dp = (struct Derived2 *)bp;
        printf("Derived2 value=%.1lf\n", dp->d);
        break;
    }
    }
}

int main(int argc, char *argv[])
{
    struct Base *bp1, *bp2;

    bp1 = (struct Base*) &s1;
    bp2 = (struct Base*) &s2;
    
    print_val(bp1);
    print_val(bp2);

    return 0;
}

根据 ISO/IEC9899,上面代码中的转换应该没问题,因为它依赖于与包含结构共享相同地址的结构的第一个成员.第 6.7.2.1-13 条是这样说的:

Per ISO/IEC9899, the casts within code above should be OK, as it relies on the first member of the structure sharing the same address as the containing structure. Clause 6.7.2.1-13 says so:

Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.

从派生到基础的转换工作正常,但在 print_val() 中转换回派生类型会生成对齐警告.然而,众所周知这是安全的,因为它特别是反之亦然".以上条款的一部分.问题是编译器根本不知道我们已经通过其他方式保证该结构实际​​上是其他类型的实例.

The casts from derived to base work fine, but the cast back to the derived type within print_val() generates an alignment warning. However this is known to be safe as it is specifically the "vice versa" part of the clause above. The problem is that the compiler simply doesn't know that the we've already guaranteed that the structure is in fact an instance of the other type via other means.

当使用 gcc 版本 9.3.0 (Ubuntu 20.04) 编译时使用标志 -std=c99 -pedantic -fstrict-aliasing -Wstrict-aliasing -Wcast-align=strict -O3 我得到:

When compiled with gcc version 9.3.0 (Ubuntu 20.04) using flags -std=c99 -pedantic -fstrict-aliasing -Wstrict-aliasing -Wcast-align=strict -O3 I get:

alignment-1.c: In function ‘print_val’:
alignment-1.c:30:31: warning: cast increases required alignment of target type [-Wcast-align]
   30 |         struct Derived1 *dp = (struct Derived1 *)bp;
      |                               ^
alignment-1.c:36:31: warning: cast increases required alignment of target type [-Wcast-align]
   36 |         struct Derived2 *dp = (struct Derived2 *)bp;
      |                               ^

在 clang 10 中出现了类似的警告.

A similar warning occurs in clang 10.

返工 1:指向指针的指针

在某些情况下用来避免对齐警告的方法(当已知指针对齐时,就像这里的情况一样)是使用中间指针到指针.例如:

A method used in some circumstances to avoid the alignment warning (when the pointer is known to be aligned, as is the case here) is to use an intermediate pointer-to-pointer. For instance:

struct Derived1 *dp = *((struct Derived1 **)&bp);

然而,这只是将对齐警告换成了严格的别名警告,至少在 gcc 上是这样:

However this just trades the alignment warning for a strict aliasing warning, at least on gcc:

alignment-1a.c: In function ‘print_val’:
alignment-1a.c:30:33: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   30 |         struct Derived1 *dp = *((struct Derived1 **)&bp);
      |                                ~^~~~~~~~~~~~~~~~~~~~~~~~

如果转换为左值也是如此,即:*((struct Base **)&dp) = bp; 在 gcc 中也会发出警告.

Same is true if cast done as an lvalue, that is: *((struct Base **)&dp) = bp; also warns in gcc.

值得注意的是,只有 gcc 会抱怨这个 - clang 10 似乎在没有警告的情况下接受了这一点,但我不确定这是否是故意的.

Notably, only gcc complains about this one - clang 10 seems to accept this either way without warning, but I'm not sure if that's intentional or not.

返工 2:结构联合

另一种重新编写此代码的方法是使用联合.所以 print_val() 函数可以重写为:

Another way to rework this code is using a union. So the print_val() function can be rewritten something like:

void print_val(struct Base *bp)
{
    union Ptr
    {
        struct Base b;
        struct Derived1 d1;
        struct Derived2 d2;
    } *u;

    u = (union Ptr *)bp;
...

可以使用联合访问各种结构.虽然这工作正常,但转换到联合仍然被标记为违反对齐规则,就像原始示例一样.

The various structures can be accessed using the union. While this works fine, the cast to a union is still flagged as violating alignment rules, just like the original example.

alignment-2.c:33:9: warning: cast from 'struct Base *' to 'union Ptr *' increases required alignment from 2 to 8 [-Wcast-align]
    u = (union Ptr *)bp;
        ^~~~~~~~~~~~~~~
1 warning generated.

返工 3:指针的联合

如下重写该函数在 gcc 和 clang 中都可以干净地编译:

Rewriting the function as follows compiles cleanly in both gcc and clang:

void print_val(struct Base *bp)
{
    union Ptr
    {
        struct Base *bp;
        struct Derived1 *d1p;
        struct Derived2 *d2p;
    } u;

    u.bp = bp;

    switch(u.bp->t)
    {
    case 1:
    {
        printf("Derived1 value=%d\n", u.d1p->i);
        break;
    }
    case 2:
    {
        printf("Derived2 value=%.1lf\n", u.d2p->d);
        break;
    }
    }
}

关于这是否真的有效,似乎存在相互矛盾的信息.特别是 https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html 特别指出类似的构造无效(请参阅该链接中的通过联合进行铸造 (3)).

There seems to be conflicting information out there as to whether this is truly valid. In particular, an older aliasing write-up at https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html specifically calls out a similar construct as being invalid (see Casting through a union (3) in that link).

在我的理解中,因为联合的指针成员都共享一个共同的基类型,这实际上并没有违反任何别名规则,因为对 struct Base 的所有访问实际上都将通过一个struct Base 类型的对象 - 无论是通过取消引用 bp union 成员还是访问 d1pb 成员对象> 或 d2p.无论哪种方式,它都通过 struct Base 类型的对象正确访问成员 - 据我所知,没有别名.

In my understanding, because pointer members of the union all share a common base type, this doesn't actually violate any aliasing rules, because all accesses to struct Base will in fact be done via an object of type struct Base - whether by dereferencing the bp union member or by accessing the b member object of the d1p or d2p. Either way it is accessing the member correctly via an object of type struct Base - so as far as I can tell, there is no alias.

具体问题:

  1. 返工 3 中建议的指针联合是否是一种可移植、安全、符合标准且可接受的方法?
  2. 如果没有,是否有一种完全可移植且符合标准且不依赖于任何平台定义/编译器特定行为或选项的方法?
  1. Is the union-of-pointers suggested in rework 3 a portable, safe, standards compliant, acceptable method of doing this?
  2. If not, is there a method that is fully portable and standards compliant, and does not rely on any platform-defined/compiler-specific behavior or options?

在我看来,由于这种模式在 C 代码中相当普遍(在没有像 C++ 那样真正的 OO 构造的情况下),因此以可移植的方式执行此操作应该更直接,而不会以一种或另一种形式收到警告.

It seems to me that since this pattern is fairly common in C code (in the absence of true OO constructs like in C++) that it should be more straightforward to do this in a portable way without getting warnings in one form or another.

提前致谢!

更新:

使用中间void* 可能是正确的"这样做的方法:

Using an intermediate void* may be the "right" way to do this:

struct Derived1 *dp = (void*)bp;

这当然有效,但它确实允许进行任何转换,无论类型兼容性如何(我认为 C 的较弱类型系统应该为此负责,我真正想要的是 C++ 的近似值和 static_cast<> 操作符)

This certainly works but it really allows any conversion at all, regardless of type compatibility (I suppose the weaker type system of C is fundamentally to blame for this, what I really want is an approximation of C++ and the static_cast<> operator)

然而,我关于严格别名规则的基本问题(误解?)仍然存在:

However, my fundamental question (misunderstanding?) about strict aliasing rules remains:

为什么使用联合类型和/或指针指向指针会违反严格的别名规则?换句话说,main 中所做的事情(取 b 成员的地址)与 print_val() 中所做的事情之间的根本区别是 direction 的转换?两者都产生相同的情况——两个指向同一内存的指针,它们是不同的结构类型——一个 struct Base* 和一个 struct Derived1*.

Why does using a union type and/or pointer-to-pointer violate strict aliasing rules? In other words what is fundamentally different between what is done in main (taking address of b member) and what is done in print_val() other than the direction of the conversion? Both yield the same situation - two pointers that point to the same memory, which are different struct types - a struct Base* and a struct Derived1*.

在我看来,如果这以任何方式违反了严格的别名规则,引入中间的 void* 转换不会改变根本问题.

It would seem to me that if this were violating strict aliasing rules in any way, the introduction of an intermediate void* cast would not change the fundamental problem.

推荐答案

您可以通过首先强制转换为 void * 来避免编译器警告:

You can avoid the compiler warning by casting to void * first:

struct Derived1 *dp = (struct Derived1 *) (void *) bp;

(在转换为 void * 之后,转换为 struct Derived1 * 在上面的声明中是自动的,因此您可以删除转换.)

(After the cast to void *, the conversion to struct Derived1 * is automatic in the above declaration, so you could remove the cast.)

使用指向指针的指针或联合来重新解释指针的方法不正确;它们违反了别名规则,因为 struct Derived1 *struct Base * 不是相互别名的合适类型.不要使用这些方法.

The methods of using a pointer-to-a-pointer or a union to reinterpret a pointer are not correct; they violate the aliasing rule, as a struct Derived1 * and a struct Base * are not suitable types for aliasing each other. Do not use those methods.

(由于 C 2018 6.2.6.1 28,它说......所有指向结构类型的指针应具有彼此相同的表示和对齐要求......",可以提出一个论点,重新解释一个指向一个的指针C 标准支持通过联合将结构作为另一个结构.脚注 49 说相同的表示和对齐要求意味着作为函数的参数、函数的返回值和联合成员的可互换性."然而,这充其量是是 C 标准中的混杂,应尽可能避免.)

(Due to C 2018 6.2.6.1 28, which says "… All pointers to structure types shall have the same representation and alignment requirements as each other…," an argument can be made that reinterpreting one pointer-to-a-structure as another through a union is supported by the C standard. Footnote 49 says "The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions." At best, however, this is a kludge in the C standard and should be avoided when possible.)

为什么使用联合类型和/或指针指向指针会违反严格的别名规则?换句话说,main 中所做的事情(取 b 成员的地址)与 print_val() 中所做的事情之间的根本区别是 direction 的转换?两者都产生相同的情况——两个指向同一内存的指针,它们是不同的结构类型——一个 struct Base* 和一个 struct Derived1*.

Why does using a union type and/or pointer-to-pointer violate strict aliasing rules? In other words what is fundamentally different between what is done in main (taking address of b member) and what is done in print_val() other than the direction of the conversion? Both yield the same situation - two pointers that point to the same memory, which are different struct types - a struct Base* and a struct Derived1*.

在我看来,如果这以任何方式违反了严格的别名规则,引入中间的 void* 转换不会改变根本问题.

It would seem to me that if this were violating strict aliasing rules in any way, the introduction of an intermediate void* cast would not change the fundamental problem.

严格别名违规发生在为指针设置别名时,而不是在为结构设置别名时.

The strict aliasing violation occurs in aliasing the pointer, not in aliasing the structure.

如果你有一个 struct Derived1 *dp 或一个 struct Base *bp 并且你用它来访问内存中实际存在 struct 的地方Derived1 或分别为 struct Base ,那么就不存在别名冲突,因为您通过其类型的左值访问对象,这是别名规则所允许的.

If you have a struct Derived1 *dp or a struct Base *bp and you use it to access a place in memory where there actually is a struct Derived1 or, respectively, a struct Base, then there is no aliasing violation because you are accessing an object through an lvalue of its type, which is allowed by the aliasing rule.

然而,这个问题建议给指针别名.在*((struct Derived1 **)&bp);中,&bpstruct Base *所在的位置.这个struct Base *的地址被转换成struct Derived1 **的地址,然后*形成一个struct Derived1 *.然后,该表达式用于使用 struct Derived1 * 类型访问 struct Base *.在别名规则中没有匹配项;它列出的用于访问 struct Base * 的类型都不是 struct Derived1 *.

However, this question suggested aliasing a pointer. In *((struct Derived1 **)&bp);, &bp is the location where there is a struct Base *. This address of a struct Base * is converted to the address of a struct Derived1 **, and then * forms an lvalue of type struct Derived1 *. The expression is then used to access a struct Base * using a type of struct Derived1 *. There is no match for that in the aliasing rule; none of the types it lists for accessing a struct Base * are a struct Derived1 *.

这篇关于在 C 中具有严格别名和严格对齐的面向对象模式的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆