将浮点序列化为 32 位整数的便携式方法 [英] Portable way to serialize float as 32-bit integer

查看:40
本文介绍了将浮点序列化为 32 位整数的便携式方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力寻找一种可移植的方法来序列化 C 和 C++ 中的 32 位浮点变量,以便发送到微控制器和从微控制器发送.我希望格式足够明确,以便序列化/反序列化也可以从其他语言完成,而无需太多努力.相关问题是:

I have been struggling with finding a portable way to serialize 32-bit float variables in C and C++ to be sent to and from microcontrollers. I want the format to be well-defined enough so that serialization/de-serialization can be done from other languages as well without too much effort. Related questions are:

double/float类型二进制序列化的可移植性在 C++ 中

用C序列化double和float

c++ long to double的可移植转换

我知道在大多数情况下,typecast union/memcpy 可以正常工作,因为浮点表示是相同的,但我希望有更多的控制权和头脑.到目前为止,我想出的是以下内容:

I know that in most cases a typecast union/memcpy will work just fine because the float representation is the same, but I would prefer to have a bit more control and piece of mind. What I came up with so far is the following:

void serialize_float32(uint8_t* buffer, float number, int32_t *index) {
    int e = 0;
    float sig = frexpf(number, &e);
    float sig_abs = fabsf(sig);
    uint32_t sig_i = 0;

    if (sig_abs >= 0.5) {
        sig_i = (uint32_t)((sig_abs - 0.5f) * 2.0f * 8388608.0f);
        e += 126;
    }

    uint32_t res = ((e & 0xFF) << 23) | (sig_i & 0x7FFFFF);
    if (sig < 0) {
        res |= 1 << 31;
    }

    buffer[(*index)++] = (res >> 24) & 0xFF;
    buffer[(*index)++] = (res >> 16) & 0xFF;
    buffer[(*index)++] = (res >> 8) & 0xFF;
    buffer[(*index)++] = res & 0xFF;
}

float deserialize_float32(const uint8_t *buffer, int32_t *index) {
    uint32_t res = ((uint32_t) buffer[*index]) << 24 |
                ((uint32_t) buffer[*index + 1]) << 16 |
                ((uint32_t) buffer[*index + 2]) << 8 |
                ((uint32_t) buffer[*index + 3]);
    *index += 4;

    int e = (res >> 23) & 0xFF;
    uint32_t sig_i = res & 0x7FFFFF;
    bool neg = res & (1 << 31);

    float sig = 0.0;
    if (e != 0 || sig_i != 0) {
        sig = (float)sig_i / (8388608.0 * 2.0) + 0.5;
        e -= 126;
    }

    if (neg) {
        sig = -sig;
    }

    return ldexpf(sig, e);
}

frexpldexp 函数似乎是为此目的而设计的,但如果它们不可用,我也尝试使用函数手动实现它们常见的:

The frexp and ldexp functions seem to be made for this purpose, but in case they aren't available I tried to implement them manually as well using functions that are common:

float frexpf_slow(float f, int *e) {
    if (f == 0.0) {
        *e = 0;
        return 0.0;
    }

    *e = ceil(log2f(fabsf(f)));
    float res = f / powf(2.0, (float)*e);

    // Make sure that the magnitude stays below 1 so that no overflow occurs
    // during serialization. This seems to be required after doing some manual
    // testing.

    if (res >= 1.0) {
        res -= 0.5;
        *e += 1;
    }

    if (res <= -1.0) {
        res += 0.5;
        *e += 1;
    }

    return res;
}

float ldexpf_slow(float f, int e) {
    return f * powf(2.0, (float)e);
}

我一直在考虑的一件事是使用 8388608 (2^23) 还是 8388607 (2^23 - 1) 作为乘数.文档说 frexp 返回的值的幅度小于 1,经过一些实验后,似乎 8388608 给出的结果与实际浮点数是位精确的,我找不到任何溢出的极端情况.但是,使用不同的编译器/系统可能并非如此.如果这会成为一个问题,那么一个较小的乘数会降低一点精度,这对我来说也很好.我知道这不能处理 Inf 或 NaN,但现在这不是必需的.

One thing I have been considering is whether to use 8388608 (2^23) or 8388607 (2^23 - 1) as the multiplier. The documentation says that frexp returns values that are less than 1 in magnitude, and after some experimentation it seems that 8388608 gives results that are bit-accurate with actual floats and I could not find any corner case where this overflows. That might not be true with a different compiler/system though. If this can become a problem a smaller multiplier which reduces the accuracy a bit is fine with me as well. I know that this does not handle Inf or NaN, but for now that is not a requirement.

所以,最后,我的问题是:这看起来是一种合理的方法,还是我只是在制作一个仍然存在可移植性问题的复杂解决方案?

So, finally, my question is: Does this look like a reasonable approach, or am I just making a complicated solution that still has portability issues?

推荐答案

您的 serialize_float 中似乎有一个错误:最后 4 行应为:

You seem to have a bug in serialize_float: the last 4 lines should read:

buffer[(*index)++] = (res >> 24) & 0xFF;
buffer[(*index)++] = (res >> 16) & 0xFF;
buffer[(*index)++] = (res >> 8) & 0xFF;
buffer[(*index)++] = res & 0xFF;

由于 126 而不是 128 的偏移,您的方法可能不适用于无穷大和/或 NaN.请注意,您可以通过广泛的测试来验证它:只有 40 亿个值,尝试所有可能性应该不会花费很长时间.

Your method might not work correctly for infinities and/or NaNs because of the offset by 126 instead of 128. Note that you can validate it by extensive testing: there are only 4 billion values, trying all possibilities should not take very long.

float 值在内存中的实际表示可能在不同的架构上有所不同,但 IEEE 854(或更准确地说是 IEC 60559)在今天很普遍.您可以通过检查是否定义了 __STDC_IEC_559__ 来验证您的特定目标是否符合要求.但是请注意,即使您可以假设 IEEE 854,您也必须处理系统之间可能存在的不同字节序.您不能假设 floats 的字节序与同一平台的整数的字节序相同.

The actual representation in memory of float values may differ on different architectures, but IEEE 854 (or more precisely IEC 60559) is largely prevalent today. You can verify if your particular targets are compliant or not by checking if __STDC_IEC_559__ is defined. Note however that even if you can assume IEEE 854, you must handle potentially different endianness between the systems. You cannot assume the endianness of floats to be the same as that of integers for the same platform.

还要注意简单的转换是不正确的:uint32_t res = *(uint32_t *)&number; 违反了严格的别名规则.您应该使用 union 或使用 memcpy(&res, &number, sizeof(res));

Note also that the simple cast would be incorrect: uint32_t res = *(uint32_t *)&number; violates the strict aliasing rule. You should either use a union or use memcpy(&res, &number, sizeof(res));

这篇关于将浮点序列化为 32 位整数的便携式方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆