修复分布式版本中的算术错误 [英] Fix arithmetic error in distributed version

查看:204
本文介绍了修复分布式版本中的算术错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经由Cholesky分解反相矩阵,在分布式环境中,它在这里 >。我的代码工作正常,但为了测试我的分布式项目产生正确的结果,我不得不与串行版本进行比较。结果不完全相同!



例如,结果矩阵的最后五个单元格是:

 串行给出:
-250207683.634793 -1353198687.861288 2816966067.598196 -144344843844.616425 323890119928.788757
分配给了:
-250207683.634692 -1353198687.861386 2816966067.598891 -144344843844.617096 323890119928.788757


我在英特尔论坛关于这一点,但我得到的答案是有关全部处决我会与分布式版本使得到同样的结果,这东西我已经有。他们似乎(在另一个线程)无法响应这个:



如何在串行和分布式执行之间获得相同的结果?这是可能的吗这将导致修复算术错误。



我已经尝试设置: mkl_cbwr_set(MKL_CBWR_AVX); 并使用 mkl_malloc(),以便对齐内存,但没有任何更改。我会得到相同的结果,只有在我将产生一个进程的分布式版本(这将使它几乎串行)的情况下!



分布式程序我调用: pdpotrf() pdpotri()



我调用的串行例程: dpotrf() dpotri()

解决方案

出现在大约第12 sf因为浮点算术不是真正的关联性(即,fp算法不能保证 a +(b + c)==(a + b)+ c 由于并行执行不,通常,给操作的应用程序的确定性顺序,这些小的差异时相比,他们的序列当量是典型的并行化的数值代码。实际上,当在不同数量的处理器上运行时,可以观察到相同的顺序差异,4比8。



不幸的是,获取确定性结果的简单方法是坚持串行执行。为了得到确定的结果从并行执行需要作出重大努力是非常具体的操作执行的顺序一直到最后一个 + * 这几乎可以肯定排除了使用的大多数数字图书馆,并导致你的大数字例程艰苦手动编码。



在我'大多数情况下, ve遇到输入数据的精度,通常来自传感器,不担心第12或更后的sf我不知道你的数字代表但许多科学家和工程师平等的第四或第五SF是所有的实际目的不够平等。这是一个不同的数学家...


I am inverting a matrix via a Cholesky factorization, in a distributed environment, as it was discussed here. My code works fine, but in order to test that my distributed project produces correct results, I had to compare it with the serial version. The results are not exactly the same!

For example, the last five cells of the result matrix are:

serial gives:
-250207683.634793 -1353198687.861288 2816966067.598196 -144344843844.616425 323890119928.788757
distributed gives:
-250207683.634692 -1353198687.861386 2816966067.598891 -144344843844.617096 323890119928.788757

I had post in the Intel forum about that, but the answer I got was about getting the same results across all the executions I will make with the distributed version, something that I already had. They seem (in another thread) to be unable to respond to this:

How to get same results, between serial and distributed execution? Is this possible? This would result in fixing the arithmetic error.

I have tried setting this: mkl_cbwr_set(MKL_CBWR_AVX); and using mkl_malloc(), in order to align memory, but nothing changed. I will get the same results, only in the case that I will spawn one process for the distributed version (which will make it almost serial)!

The distributed routines I am calling: pdpotrf() and pdpotri().

The serial routines I am calling: dpotrf() and dpotri().

解决方案

Your differences seem to appear at about the 12th s.f. Since floating-point arithmetic is not truly associative (that is, f-p arithmetic does not guarantee that a+(b+c) == (a+b)+c), and since parallel execution does not, generally, give a deterministic order of the application of operations, these small differences are typical of parallelised numerical codes when compared to their serial equivalents. Indeed you may observe the same order of difference when running on a different number of processors, 4 vs 8, say.

Unfortunately the easy way to get deterministic results is to stick to serial execution. To get deterministic results from parallel execution requires a major effort to be very specific about the order of execution of operations right down to the last + or * which almost certainly rules out the use of most numeric libraries and leads you to painstaking manual coding of large numeric routines.

In most cases that I've encountered the accuracy of the input data, often derived from sensors, does not warrant worrying about the 12th or later s.f. I don't know what your numbers represent but for many scientists and engineers equality to the 4th or 5th sf is enough equality for all practical purposes. It's a different matter for mathematicians ...

这篇关于修复分布式版本中的算术错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆