如何使浮点计算具有确定性? [英] How can floating point calculations be made deterministic?

查看:16
本文介绍了如何使浮点计算具有确定性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

浮点计算在处理器上既不关联也不分布.所以,

Floating point calculation is neither associative nor distributive on processors. So,

(a + b) + c 不等于 a + (b + c)

and a * (b + c) 不等于 a * b + a * c

是否有任何方法可以执行不会给出不同结果的确定性浮点计算.当然,这在单处理器上是确定性的,但在多线程程序中,如果线程相加,例如,它就不是确定性的,因为线程可能存在不同的交错.

Is there any way to perform deterministic floating point calculation that do not give different results. It would be deterministic on uniprocessor ofcourse, but it would not be deterministic in multithreaded programs if threads add to a sum for example, as there might be different interleavings of the threads.

所以我的问题是,如何在多线程程序中实现浮点计算的确定性结果?

So my question is, how can one achieve deterministic results for floating point calculations in multithreaded programs?

推荐答案

浮点确定性的.相同的浮点运算,在相同的硬件上运行,总是产生相同的结果.没有黑魔法、噪音、随机性、模糊或人们通常归因于浮点的任何其他东西.牙仙没有出现,拿走你的结果的低位,在你的枕头下留下四分之一.

Floating-point is deterministic. The same floating-point operations, run on the same hardware, always produces the same result. There is no black magic, noise, randomness, fuzzing, or any of the other things that people commonly attribute to floating-point. The tooth fairy does not show up, take the low bits of your result, and leave a quarter under your pillow.

也就是说,某些通常用于大规模并行计算的阻塞算法在浮点计算的执行顺序方面是不确定的,这可能导致跨运行的非位精确结果.

Now, that said, certain blocked algorithms that are commonly used for large-scale parallel computations are non-deterministic in terms of the order in which floating-point computations are performed, which can result in non-bit-exact results across runs.

你能做些什么呢?

首先,请确保您确实无法忍受这种情况.您可能尝试在并行计算中强制执行排序的许多事情都会损害性能.就是这样.

First, make sure that you actually can't live with the situation. Many things that you might try to enforce ordering in a parallel computation will hurt performance. That's just how it is.

我还要指出,尽管阻塞算法可能会引入一定程度的非确定性,但它们提供的结果通常比简单的未阻塞串行算法更小舍入误差(令人惊讶但确实如此!).如果你能忍受朴素串行算法产生的错误,你可能也能忍受并行阻塞算法的错误.

I would also note that although blocked algorithms may introduce some amount of non-determinism, they frequently deliver results with smaller rounding errors than do naive unblocked serial algorithms (surprising but true!). If you can live with the errors produced by a naive serial algorithm, you can probably live with the errors of a parallel blocked algorithm.

现在,如果您真的非常需要跨运行的精确再现性,这里有一些建议,它们往往不会对性能产生太大的负面影响:

Now, if you really, truly, need exact reproducibility across runs, here are a few suggestions that tend not to adversely affect performance too much:

  1. 不要使用可以重新排序浮点计算的多线程算法.问题解决了.这并不意味着您根本不能使用多线程算法,只是您需要确保每个单独的结果仅由同步点之间的单个线程接触.请注意,如果处理得当,这实际上可以提高某些架构的性能,方法是减少内核之间的 D$ 争用.

  1. Don't use multithreaded algorithms that can reorder floating-point computations. Problem solved. This doesn't mean you can't use multithreaded algorithms at all, merely that you need to ensure that each individual result is only touched by a single thread between synchronization points. Note that this can actually improve performance on some architectures if done properly, by reducing D$ contention between cores.

在归约操作中,您可以让每个线程将其结果存储到数组中的索引位置,等待所有线程完成,然后按顺序累积数组的元素.这会增加少量的内存开销,但通常是可以容忍的,尤其是当线程数很少"时.

In reduction operations, you can have each thread store its result to an indexed location in an array, wait for all threads to finish, the accumulate the elements of the array in order. This adds a small amount of memory overhead, but is generally pretty tolerable, especially when the number of threads is "small".

想办法提升并行度.不是计算 24 个矩阵乘法,每个乘法使用并行算法,而是并行计算 24 个矩阵乘积,每个乘法使用串行算法.这也可能对性能有益(有时非常有益).

Find ways to hoist the parallelism. Instead of computing 24 matrix multiplications, each one of which uses parallel algorithms, compute 24 matrix products in parallel, each one of which uses a serial algorithm. This, too, can be beneficial for performance (sometimes enormously so).

还有很多其他方法可以处理这个问题.他们都需要思考和关心.并行编程通常可以.

There are lots of other ways to handle this. They all require thought and care. Parallel programming usually does.

这篇关于如何使浮点计算具有确定性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆