通过联合和位移来读取平台字节序的两倍,这安全吗? [英] Reading double to platform endianness with union and bit shift, is it safe?

查看:114
本文介绍了通过联合和位移来读取平台字节序的两倍,这安全吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我所见过的从缓冲区读取已知字节数到平台字节数的所有示例都涉及检测当前平台的字节数,并在必要时执行字节交换。

All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary.

另一方面,除了使用位移位的整数(一个这样的示例)。

On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example).

这使我想到可能可以使用联合和位移位技术从缓冲区读取双精度(和浮点型),并进行快速测试实现似乎起作用(至少在x86_64上使用clang起作用):

This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation seemed to work (at least with clang on x86_64):

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

double read_double(char * buffer, bool le) {
    union {
        double d;
        uint64_t i;
    } data;
    data.i = 0;

    int off = le ? 0 : 7;
    int add = le ? 1 : -1;
    for (int i = 0; i < 8; i++) {
        data.i |= ((uint64_t)(buffer[off] & 0xFF) << (i * 8));
        off += add;
    }
    return data.d;
}

int main() {
    char buffer_le[] = {0x6E, 0x86, 0x1B, 0xF0, 0xF9, 0x21, 0x09, 0x40};
    printf("%f\n", read_double(buffer_le, true)); // 3.141590

    char buffer_be[] = {0x40, 0x09, 0x21, 0xF9, 0xF0, 0x1B, 0x86, 0x6E};
    printf("%f\n", read_double(buffer_be, false)); // 3.141590

    return 0;
}

我的问题是,这样做安全吗?还是这里涉及未定义的行为?还是如果此方法和字节交换方法都涉及未定义的行为,那么哪个方法比另一个安全吗?

My question though is, is this a safe way to do this? Or is there undefined behavior involved here? Or if both this and the byte-swap method involve undefined behavior, is one safer than the other?

推荐答案

通过联合重新解释



通过移位和ORing字节构造 uint64_t 值当然受C标准支持。 (由于需要确保左操作数的大小和类型正确,以避免在溢出和移位宽度方面出现问题,因此在进行移位时存在一些危险,但是问题中的代码正确地转换为 uint64_t ,然后移入代码。)然后,代码剩下的问题是C标准是否允许通过联合进行重新解释。答案是肯定的。

Reinterpreting Through a Union

Constructing a uint64_t value by shifting and ORing bytes is of course supported by the C standard. (There is some hazard when shifting due to the need to ensure the left operand is the correct size and type to avoid issues with overflow and shift width, but the code in the question correctly converts to uint64_t before shifting.) Then the question remaining for the code is whether reinterpreting through a union is permitted by the C standard. The answer is yes.

C 6.5.2.3 3说:

C 6.5.2.3 3 says:


A后缀表达式,后跟运算符和标识符,指定结构或联合对象的成员。该值是指定成员的值, 99) ...

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,99)

,注释99表示:


如果用于读取联合对象内容的成员与上次用于在对象中存储值的成员不同,则适当值的对象表示形式的一部分将重新解释为新类型的对象表示形式,如6.2.6中所述(有时称为类型校正的过程)…

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning")…

当然,这种重新解释依赖于C实现中使用的对象表示。值得注意的是 double 必须使用预期的格式,匹配从输入流读取的字节。

Such reinterpretation of course relies on the object representations used in the C implementation. Notably the double must use the expected format, matching the bytes read from the input stream.

通过修改对象的字节(例如,使用指向 unsigned char 的指针)来修改对象是允许的C.C 2018 6.5 7说:

Modifying an object by modifying its bytes (as by using a pointer to unsigned char) is permitted by C. C 2018 6.5 7 says:


对象只能由具有以下类型之一的左值表达式访问其存储值:[各种类型的列表]或字符类型。

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [list of various types], or a character type.

尽管其中一项评论指出您可以访问但不能C 2018 3.1将访问定义为:修改对象的字节(显然将访问解释为仅读而不是写):

Although one of the comments states that you may "access" but not "modify" the bytes of an object this way (apparently interpreting "access" to mean only reading, not writing), C 2018 3.1 defines "access" as:


读取或修改对象的值。

to read or modify the value of an object.

因此,允许读取或写入字节对象通过字符类型。

Thus, one is permitted to read or write the bytes of an object through character types.

这篇关于通过联合和位移来读取平台字节序的两倍,这安全吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆