使用 union 进行转换的可移植性 [英] Portability of using union for conversion

查看:30
本文介绍了使用 union 进行转换的可移植性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 RGBA 值表示一个 32 位数字,使用联合生成所述数字的值是否可移植?考虑这个 C 代码;

联合像素{uint32_t 值;uint8_t RGBA[4];};

这编译得很好,我喜欢用它而不是一堆函数.但是安全吗?

解决方案

使用联合进行类型双关"在 C 中很好,在 gcc 的 C++ 中也很好(作为 gcc [g++] 扩展).但是,类型双关语"via unions 具有硬件架构字节序注意事项.

这被称为

所以,有

typedef union my_union_u{uint32_t 值;///一个足够大的字节数组,可以容纳联合中任何值中的最大值.uint8_t 字节[sizeof(uint32_t)];} my_union_t;

作为将 value 转换为 bytes 的一种方式在 C 中很好.在 C++ 中,它作为 GNU gcc 扩展工作(但不是 C++ 标准的一部分).

根据上面的维基百科文章,网络协议通常使用big-endian字节顺序,而大多数处理器(x86、大多数 ARM 等)/em>,通常是little-endian(强调):

<块引用>

Big-endianness网络协议中的主要顺序,例如在 Internet 协议套件中,它被称为网络顺序,传输最重要的字节在前.相反,little-endianness处理器架构(x86、大多数 ARM 实现、基本 RISC-V 实现)及其相关内存的主要顺序.


关于是否类型双关"的更多说明标准支持

根据

我认为 gcc 允许类型双关(写入联合的一个成员,但从联合中的另一个成员读取,作为翻译"的一种形式)作为gcc 扩展",但 C和 C++ 标准,如果在你的构建标志中使用 -Wpedantic,否则禁止它.

另见:

  1. 从我的仓库下载并运行以上所有代码:https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/c/type_punning.c
  2. 联合,别名和实践中的打字:什么有效,什么无效?
  3. 联合和类型双关
  4. [我的 repo] 我将 READ_BYTE() 作为宏添加到我的 utilities.h 文件在我的 eRCaGuy_hello_world 存储库.
  5. 在哪里可以找到当前的 C或 C++ 标准文档?
  6. https://news.ycombinator.com/item?id=17263328

    1. 是通过 C99 中未指定的联合进行类型转换,并且它是否已在 C11 中指定? <== 请特别参阅此处.显然,C 标准并没有很好地说明这一点.

关键字:C 中的类型双关,将类型和结构转换为 C 中的字节

I want to represent a 32-bit number using RGBA values, is it portable to generate the values for said number using a union? Consider this C code;

union pixel {
    uint32_t value;
    uint8_t RGBA[4];
};

This compiles fine, and id like to use it instead of a bunch of functions. But is it safe?

解决方案

Using Unions for "type punning" is fine in C, and fine in gcc's C++ as well (as a gcc [g++] extension). But, "type punning" via unions has hardware architecture endianness considerations.

This is called "type punning", and it is not directly portable due to endianness considerations. However, otherwise, doing it is just fine. The C standards have NOT been great about making it clear this is just fine, but apparently it is. Read these answers and sources:

  1. Is type-punning through a union unspecified in C99, and has it become specified in C11?
  2. Unions and type-punning
  3. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type%2Dpunning - type punning is allowed in gcc C and C++

Additionally, the C18 draft, N2176 ISO/IEC 9899:2017 states in section "6.5.2.3 Structure and union members", the following in footnote 97:

  1. If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

See it in this screenshot here:

So, having

typedef union my_union_u
{
    uint32_t value;
    /// A byte array large enough to hold the largest of any value in the union.
    uint8_t bytes[sizeof(uint32_t)];
} my_union_t;

as a means of translating value into bytes is just fine in C. In C++ it works as a GNU gcc extension (but not as part of the C++ standard). See @Christoph's explanation in his answer here:

GNU extensions to standard C++ (and to C90) do explicitly allow type-punning with unions. Other compilers that don't support GNU extensions may also support union type-punning, but it's not part of the base language standard.


Download the code: you can download and run all the code below from my eRCaGuy_hello_world repo here: "type_punning.c". gcc build and run commands for both C and C++ are found in the comments at the very top of the file.


So, you can do something like this to read the individual bytes out of the uint32_t value:

TECHNIQUE 1: union-based type punning:

my_union_t u;

// write to uint32_t value
u.value = 1234;

// read individual bytes from uint32_t value
printf("1st byte = 0x%02X\n", (u.bytes)[0]);
printf("2nd byte = 0x%02X\n", (u.bytes)[1]);
printf("3rd byte = 0x%02X\n", (u.bytes)[2]);
printf("4th byte = 0x%02X\n", (u.bytes)[3]);

Sample output:

  1. On a little-endian architecture:

    1st byte = 0xD2
    2nd byte = 0x04
    3rd byte = 0x00
    4th byte = 0x00
    

  2. On a big-endian architecture:

    1st byte = 0x00
    2nd byte = 0x00
    3rd byte = 0x04
    4th byte = 0xD2
    

You can use raw pointers to obtain bytes from variables too, but this technique also has hardware architecture endianness issues.

This could be done withOUT a union if you wanted by using raw pointers too, like this:

TECHNIQUE 2: reading through raw pointers:

uint32_t value = 1234;
uint8_t *bytes = (uint8_t *)&value;

// read individual bytes from uint32_t value
printf("1st byte = 0x%02X\n", bytes[0]);
printf("2nd byte = 0x%02X\n", bytes[1]);
printf("3rd byte = 0x%02X\n", bytes[2]);
printf("4th byte = 0x%02X\n", bytes[3]);

Sample output:

  1. On a little-endian architecture:

    1st byte = 0xD2
    2nd byte = 0x04
    3rd byte = 0x00
    4th byte = 0x00
    

  2. On a big-endian architecture:

    1st byte = 0x00
    2nd byte = 0x00
    3rd byte = 0x04
    4th byte = 0xD2
    

You can use bitmasks and bit-shifting to avoid hardware architecture endianness portability issues.

To avoid endianness issues which exist with both the union type punning and raw pointer approaches above, you can use something like the following instead. This avoids endianness differences between hardware architectures:

TECHNIQUE 3.1: use bit-masks and bit shifting:

uint32_t value = 1234;

uint8_t byte0 = (value >> 0)  & 0xff;
uint8_t byte1 = (value >> 8)  & 0xff;
uint8_t byte2 = (value >> 16) & 0xff;
uint8_t byte3 = (value >> 24) & 0xff;

printf("1st byte = 0x%02X\n", byte0);
printf("2nd byte = 0x%02X\n", byte1);
printf("3rd byte = 0x%02X\n", byte2);
printf("4th byte = 0x%02X\n", byte3);

Sample output (the above technique is endianness-independent!):

  1. On a all architectures: both big-endian AND little-endian:

    1st byte = 0xD2
    2nd byte = 0x04
    3rd byte = 0x00
    4th byte = 0x00
    

OR:

TECHNIQUE 3.2: use a convenience macro to do bit-masks and bit shifting:

#define BYTE(value, byte_num) ((uint8_t)(((value) >> (8*(byte_num))) & 0xff))

uint32_t value = 1234;

uint8_t byte0 = BYTE(value, 0);
uint8_t byte1 = BYTE(value, 1);
uint8_t byte2 = BYTE(value, 2);
uint8_t byte3 = BYTE(value, 3);

// OR

uint8_t bytes[] = {
    BYTE(value, 0), 
    BYTE(value, 1), 
    BYTE(value, 2), 
    BYTE(value, 3), 
};

printf("1st byte = 0x%02X\n", byte0);
printf("2nd byte = 0x%02X\n", byte1);
printf("3rd byte = 0x%02X\n", byte2);
printf("4th byte = 0x%02X\n", byte3);
printf("---------------\n");
printf("1st byte = 0x%02X\n", bytes[0]);
printf("2nd byte = 0x%02X\n", bytes[1]);
printf("3rd byte = 0x%02X\n", bytes[2]);
printf("4th byte = 0x%02X\n", bytes[3]);

Sample output (the above technique is endianness-independent!):

  1. On a all architectures: both big-endian AND little-endian:

    1st byte = 0xD2
    2nd byte = 0x04
    3rd byte = 0x00
    4th byte = 0x00
    ---------------
    1st byte = 0xD2
    2nd byte = 0x04
    3rd byte = 0x00
    4th byte = 0x00
    

Otherwise, (my_pixel.RGBA)[0], or (u.bytes)[0], might be equal to byte0 (as I've defined it above) if the architecture is Little-endian, or equal to byte3 if the architecture is Big-endian.

See this endianness graphic below: https://en.wikipedia.org/wiki/Endianness. Notice that In big-endian, the most-significant-byte of any given variable is stored first (meaning: in lower addresses) in memory, but in little-endian it is the least-significant-byte that is stored first (in lower addresses) in memory. Also remember that endianness describes byte order, NOT bit order (bit order within a byte has nothing to do with endianness), and that each byte is 2 hex characters, or "nibbles", where a nibble is 4 bits.

According to the Wikipedia article above, networking protocols usually use big-endian byte order, whereas most processors (x86, most ARM, etc.), usually are little-endian (emphasis added):

Big-endianness is the dominant ordering in networking protocols, such as in the internet protocol suite, where it is referred to as network order, transmitting the most significant byte first. Conversely, little-endianness is the dominant ordering for processor architectures (x86, most ARM implementations, base RISC-V implementations) and their associated memory.


More notes regarding whether or not "type punning" is supported by the standard

According to Wikipedia's "Type punning" article, writing to union member value but reading from RGBA[4] is "unspecified behavior". However, @Eric Postpischil points out in his comment below this answer that Wikipedia is wrong. The other references at the top of this answer also don't align with the Wikipedia answer as it is written now.

Eric Postpischil's comment, which I now understand and agree with, states (emphasis added):

The quoted text, about bytes corresponding to union members other than the last one stored, does not apply to this situation. It applies to a case where, for example, a two-byte short member is written and a four-byte int member is read. The extra two bytes are unspecified. This gives a C implementation license to implement the store to the short as a two-byte store (leaving the remaining bytes of the union unchanged) or a four-byte store (perhaps because it is efficient for the processor). In the case at hand, we have a four-byte uint32_t member and a four-byte uint8_t [4] member.

Wikipedia claims (as of 22 Apr. 2021):

For union:

union {
    unsigned int ui;
    float d;
} my_union = { .d = x };

Accessing my_union.ui after initializing the other member, my_union.d, is still a form of type-punning [4] in C and the result is unspecified behavior [5] (and undefined behavior in C++ [6]).

From reference [5] above: "Unspecified Behavior" includes:

The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).

This means that if you store data into one member of a union, but read it from another, which is exactly what you're wanting to use that union for, it is "unspecified behavior" per the C standard.

I think gcc allows type punning (writing into one member of a union, but reading from another member in the union, as a form of "translation") as a "gcc extension", but the C and C++ standards, if using -Wpedantic in your build flags, otherwise prohibit it.

See also:

  1. Download and run all of the above code from my repo here: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/c/type_punning.c
  2. Unions, aliasing and type-punning in practice: what works and what does not?
  3. Unions and type-punning
  4. [my repo] I added READ_BYTE() as a macro to my utilities.h file in my eRCaGuy_hello_world repo.
  5. Where do I find the current C or C++ standard documents?
  6. https://news.ycombinator.com/item?id=17263328

    1. Is type-punning through a union unspecified in C99, and has it become specified in C11? <== SEE HERE ESPECIALLY. APPARENTLY THE C STANDARD HASN'T BEEN GOOD ABOUT BEING SUPER CLEAR ABOUT THIS.

Keywords: type punning in C, translating types and structs to bytes in C

这篇关于使用 union 进行转换的可移植性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆