如何在C ++中编写可移植的浮点运算? [英] How to write portable floating point arithmetic in c++?
问题描述
您如何确保您的浮点运算在所有平台上都是相同的?例如,如何确保在所有平台上的32位浮点值真的是32位?
对于整数,我们有 stdint.h ,但是似乎并不存在相同的浮点数。
我得到了非常有趣的答案,但我想添加一些精度
对于整数,我可以这样写:
#include < stdint>
[...]
int32_t myInt;
并确保无论(C99兼容)平台,myInt都是32位
如果我写:
double myDouble;
float myFloat;
我确定这将分别编译为64位和32位浮点数平台?
非IEEE 754
。在一致性和性能之间总是有一个平衡点,C ++把它交给你。
对于没有浮点运算的平台(如嵌入式和信号处理处理器),你不能使用C ++原生浮点操作,至少不能移植。虽然软件层是可能的,但这对于这种类型的设备当然是不可行的。
对于这些,您可以使用16位或32位定点算法(但您甚至可能会发现long只支持基本的 - 而且经常div非常昂贵) 。但是,这将比内置的定点算法慢得多,并且在基本的四个操作之后变得痛苦。
我没有遇到过支持浮动的设备指向与 IEEE 754 不同的格式。根据我的经验,最好的选择是希望获得这个标准,否则通常会最终围绕设备的功能建立算法和代码。当 sin(x)
突然花费1000倍,最好选择一个不需要的算法。
< h2> IEEE 754 - 一致性
我在这里发现的唯一不可移植性是当您期望跨平台的位相同的结果。最大的影响是优化器。再次,您可以交易准确性和速度的一致性。大多数编译器都有一个选项 - 例如在Visual C ++中的浮点一致性。但是请注意,这是标准保证以外的准确性。
为什么结果不一致?
首先,FPU寄存器通常具有比双精度(例如80比特)更高的分辨率,所以只要代码生成器不将值存回,中间值保持更高的精度。第二,等价如 a *(b + c)= a * b + a * c
不是确切的由于精度有限。尽管如此,优化器,如果允许的话,可以利用它们。另外 - 我从硬盘上学到的东西 - 打印和解析功能在各个平台上不一定是一致的,这可能是由于数字的不准确性。
float
浮动操作在本质上比双倍快。在大型浮点数组上运行速度通常较快,通常仅通过较少的高速缓存未命中。
请注意浮动精度。它可以很好很长一段时间,但我经常看到它比预期的速度更快。由于SIMD支持,基于浮点的FFT可以更快,但是在音频处理的早期会产生显着的伪像。
Say you're writing a C++ application doing lots of floating point arithmetic. Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...).
How would you make sure that your floating point arithmetic is the same on all platforms ? For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ?
For integers we have stdint.h but there doesn't seem to exist a floating point equivalent.
[EDIT]
I got very interesting answers but I'd like to add some precision to the question.
For integers, I can write:
#include <stdint>
[...]
int32_t myInt;
and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer.
If I write:
double myDouble;
float myFloat;
am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ?
Non-IEEE 754
Generally, you cannot. There's always a trade-off between consistency and performance, and C++ hands that to you.
For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. While a software layer would be possible, that's certainly not feasible for this type of devices.
For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations.
I haven't come across devices that support floating point in a different format than IEEE 754. From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. When sin(x)
suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it.
IEEE 754 - Consistency
The only non-portability I found here is when you expect bit-identical results across platforms. The biggest influence is the optimizer. Again, you can trade accuracy and speed for consistency. Most compilers have a option for that - e.g. "floating point consistency" in Visual C++. But note that this is always accuracy beyond the guarantees of the standard.
Why results become inconsistent? First, FPU registers often have higher resolution than double's (e.g. 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy.
Second, the equivalences like a*(b+c) = a*b + a*c
are not exact due to the limited precision. Nonetheless the optimizer, if allowed, may make use of them.
Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too.
float
It is a common misconception that float operations are intrinsically faster than double. working on large float arrays is faster usually through less cache misses alone.
Be careful with float accuracy. it can be "good enough" for a long time, but I've often seen it fail faster than expected. Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing.
这篇关于如何在C ++中编写可移植的浮点运算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!