在C编译器解析双precision IEEE浮点没有双precision型 [英] Parse double precision IEEE floating-point on a C compiler with no double precision type

查看：261 发布时间：2016/8/21 20:48:45 c++ c casting avr

本文介绍了在C编译器解析双precision IEEE浮点没有双precision型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我与8位AVR芯片工作。有一个64位双没有数据类型（双只映射到32位浮点）。然而，我将接受64位的双打在串口和需要输出64位的双打在串行。

我怎么能再次转换的64位双精度为32位浮点和背部而无需进行转换？为32位和64位将遵循IEEE 754课程格式我转换为32位浮点时，假设precision的损失。

有关从64位转换为32位浮点，我想这一点：

  //从http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1281990303最初的脚本
浮动转换（uint8_t有*的）{
  工会{
    真正的浮动;
    uint8_t有[4]的基础;
  } U;
  uint16_t EXPD =（（在[7]＆放大器; 127）所述; 4;）+（（在[6]＆放大器; 240）＆GT;→4）;
  uint16_t expf = EXPD？ （EXPD  -  1024）+ 128：0;
  u.base [3] =（[7]＆放大器; 128）+（expf＆GT;大于1）;
  u.base [2] =（（expf＆放大器; 1） - ; 7;）+（（在[6]及15条）下; 3;）+（（在[5]＆安培; 0xe0的）GT;大于5）;
  u.base [1] =（（在[5]＆安培; 0x1F的）下; 3;）+（（在[4]＆放大器; 0xe0的）GT;大于5）;
  u.base [0] =（（在[4]＆放大器; 0x1F的）下; 3;）+（（在[3]＆放大器; 0xe0的）GT;大于5）;
  返回u.real;
}

对于像1.0和2.0，上面的作品，但是当我在1.1传递作为64位双测试，输出是关闭的一个位（从字面上看，而不是双关语！），虽然这可能是数我测试的问题。参见：

  //在Java中的浮点位和对位的浮点用C后的比较
//从64位双转换。最后一位是不同的。
// Java的code可以在https://gist.github.com/912636找到
JAVA FLOAT：00111111 10001100 11001100 11001101
Ç翻新FLOAT：00111111 10001100 11001100 11001100

解决方案

IEEE指定五种不同舍入模式的，但默认情况下使用的是回合一半甚至。所以，你有形式10001100 11001100 11001100 11001100的尾数...你必须把它四舍五入到24位。编号从0（最显著）位，位24为1;但这是不够的，告诉你是否圆位23涨不跌。如果所有其余位为0，你就不会围捕，因为23位为0（偶数）。但其余位不为零，所以你圆了在所有情况下。

一些例子：

10001100 11001100 11001100千万...（全部为零）不圆了，因为23位已经是偶数。

10001100 11001100 11001101千万...（全部为零）不圆了，因为23位是奇数。

10001100 11001100 1100110x千万...... 0001总是向上舍，因为剩余位不完全为零。

10001100 11001100 1100110x 0xxxxxxx ...从来没有向上舍，因为24位是零。

I am working with an 8-bit AVR chip. There is no data type for a 64-bit double (double just maps to the 32-bit float). However, I will be receiving 64-bit doubles over Serial and need to output 64-bit doubles over Serial.

How can I convert the 64-bit double to a 32-bit float and back again without casting? The format for both the 32-bit and 64-bit will follow IEEE 754. Of course, I assume a loss of precision when converting to the 32-bit float.

For converting from 64-bit to 32-bit float, I am trying this out:

// Script originally from http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1281990303
float convert(uint8_t *in) {
  union {
    float real;
    uint8_t base[4];
  } u;
  uint16_t expd = ((in[7] & 127) << 4) + ((in[6] & 240) >> 4);
  uint16_t expf = expd ? (expd - 1024) + 128 : 0;
  u.base[3] = (in[7] & 128) + (expf >> 1);
  u.base[2] = ((expf & 1) << 7) + ((in[6] & 15) << 3) + ((in[5] & 0xe0) >> 5);
  u.base[1] = ((in[5] & 0x1f) << 3) + ((in[4] & 0xe0) >> 5);
  u.base[0] = ((in[4] & 0x1f) << 3) + ((in[3] & 0xe0) >> 5);
  return u.real;
}

For numbers like 1.0 and 2.0, the above works, but when I tested with passing in a 1.1 as a 64-bit double, the output was off by a bit (literally, not a pun!), though this could be an issue with my testing. See:

// Comparison of bits for a float in Java and the bits for a float in C after
// converted from a 64-bit double. Last bit is different.
// Java code can be found at https://gist.github.com/912636
JAVA FLOAT:        00111111 10001100 11001100 11001101
C CONVERTED FLOAT: 00111111 10001100 11001100 11001100

解决方案

IEEE specifies five different rounding modes, but the one to use by default is Round half to even. So you have a mantissa of the form 10001100 11001100 11001100 11001100... and you have to round it to 24 bits. Numbering the bits from 0 (most significant), bit 24 is 1; but that is not enough to tell you whether to round bit 23 up or not. If all the remaining bits were 0, you would not round up, because bit 23 is 0 (even). But the remaining bits are not zero, so you round up in all cases.

Some examples:

10001100 11001100 11001100 10000000...(all zero) doesn't round up, because bit 23 is already even.

10001100 11001100 11001101 10000000...(all zero) does round up, because bit 23 is odd.

10001100 11001100 1100110x 10000000...0001 always rounds up, because the remaining bits are not all zero.

10001100 11001100 1100110x 0xxxxxxx... never rounds up, because bit 24 is zero.

这篇关于在C编译器解析双precision IEEE浮点没有双precision型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在C编译器解析双precision IEEE浮点没有双precision型 [英] Parse double precision IEEE floating-point on a C compiler with no double precision type

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

在C编译器解析双precision IEEE浮点没有双precision型 [英] Parse double precision IEEE floating-point on a C compiler with no double precision type

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭