为提高::随机:: uniform_real_distribution应该是相同的跨处理器? [英] is boost::random::uniform_real_distribution supposed to be the same across processors?

查看:264
本文介绍了为提高::随机:: uniform_real_distribution应该是相同的跨处理器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下code产生的32位x86的64位VS处理器不同的输出。

时它应该是这个样子?如果我的std :: uniform_real_distribution取代它与编译-std = C ++ 11它产生两个处理器相同的输出。

 的#include<&iostream的GT;
#包括LT&;升压/随机/ mersenne_twister.hpp>
#包括LT&;升压/随机/ uniform_real_distribution.hpp>诠释的main()
{
    提高:: mt19937根;
    gen.seed(4294653137UL);
    。性病::法院precision(1000);
    双LO = - 的std :: numeric_limits<双> :: MAX()/ 2;
    双HI = +的std :: numeric_limits<双> :: MAX()/ 2;
    提高::随机:: uniform_real_distribution<双> boost_distrib(LO,HI);
    性病::法院LT&;< LO<< LO<<的'\\ n';
    性病::法院LT&;< 喜<<喜<< \\ n \\ n;
    性病::法院LT&;< 助推创DISTRIB<< boost_distrib(根)LT;<的'\\ n';
}


解决方案

顺便说一句,你可以写的boost :: mt19937根(4294653137UL); 来避免与播种在默认的构造函数默认种子(5489)。您code的遍历发电机的内部状态的所有624 uint32_t的元素的两倍。


发电机始终是罚款,并以同样的任何机器上。所不同的仅来自使用浮点把它映射到 uniform_real_distribution

G ++ -m32 -msse2 -mfpmath = SSE 产生相同的输出到所有其他的编译器即可。 32 VS 64位是不同的,因为64位SSE采用浮法数学,所以双击临时始终是64位。 32位的x86默认使用传统的x87 FPU,这里的一切都是80bit的内部,只有四舍五入至64 双击存储到内存中时。

32位铛仍然使用SSE数学在默认情况下,所以它得到了相同的结果,以64位铛或64 G ++。告诉G ++做同样的解决了这个问题。 -mfpmath = SSE 告诉它做的计算与SSE(虽然它不改变ABI,所以浮点返回值仍然在的x87 ST (0) -msse2 告诉G ++假定目标机器支持SSE和SSE2。 ( SSE2问题加双precision到< A HREF =/问题/标记/ SSE类=标签后称号=显示标记的问题'SSE'的rel =标签>上证的单precision。SSE2是在x86-64架构基线,并用于传递/在64位的ABI返回FP ARGS。)

没有SSE,你的可能的(但不要)使用 -ffloat店来$ P $通过存储和重新加载它们pcisely遵循C标准和圆形中间结果以32或64位。这增加了约6延迟周期,每一个FP算术指令。 (相对于3循环FP增加,5周期FP MUL英特尔的Haswell。)所以的的做到这一点,你会得到可怕的code。


调试步骤:
我尝试了一下在Ubuntu 15.10,与G ++ 5.2,铛-3.5,和铛-3.8(从的http:// LLVM。组织/公寓/ )。

 对我./boost-random-seedint*;做回声-ne$ I:\\ t的; $ I |的md5sum;完成
./boost-random-seedint-g++32:53d99523ca2afeac428eae2c89e69974 -
./boost-random-seedint-g++64:a59f08c0bc22b8753c474db077b809bd -
./boost-random-seedint-clang3.5-32:a59f08c0bc22b8753c474db077b809bd -
./boost-random-seedint-clang3.5-64:a59f08c0bc22b8753c474db077b809bd -
./boost-random-seedint-clang3.8-32:a59f08c0bc22b8753c474db077b809bd -
./boost-random-seedint-clang3.8-64:a59f08c0bc22b8753c474db077b809bd -

所以,唯一的异常就是32位的G ++。所有其它输出具有相同的哈希

编译器选项:

 铛++  -  3.8 -m32 -O1 -g升压随机seedint.cpp -o升压随机seedint-clang3.8-32#和similiar
G ++ -m32 -oG -g升压随机seedint.cpp -o升压随机seedint32

铛没有一个 -oG 。 32位G ++用-O0和-O3使二进制文件,让从 -oG 输出作为一个相同的。


调试32位和64位二进制文​​件。其状态数组是默认种子后调用 gen.seed(4294653137UL)后相同

嗯,我不知道这是一个 -ffloat店的问题,与的x87 浮动数学80bit的饲养precision的中间结果。

The following code produces different output on x86 32bit vs 64bit processors.

Is it supposed to be this way? If I replace it with std::uniform_real_distribution and compile with -std=c++11 it produces the same output on both processors.

#include <iostream>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_real_distribution.hpp>

int main()
{
    boost::mt19937 gen;
    gen.seed(4294653137UL);
    std::cout.precision(1000);
    double lo = - std::numeric_limits<double>::max() / 2 ;
    double hi = + std::numeric_limits<double>::max() / 2 ;
    boost::random::uniform_real_distribution<double> boost_distrib(lo, hi);
    std::cout << "lo " << lo << '\n';
    std::cout << "hi " << hi << "\n\n";
    std::cout << "boost distrib gen " << boost_distrib(gen) << '\n';
}

解决方案

BTW, you could have written boost::mt19937 gen(4294653137UL); to avoid seeding with the default seed (5489) in the default constructor. Your code has to loop over all 624 uint32_t elements of the generator's internal state twice.


The generator is always fine, and works the same on any machine. The difference only comes from using floating-point to map it to a uniform_real_distribution.

g++ -m32 -msse2 -mfpmath=sse produces identical output to all the other compilers. 32 vs 64bit is different because 64bit uses SSE for float math, so double temporaries are always 64bit. 32bit x86 defaults to using the legacy x87 FPU, where everything is 80bit internally, and only rounded down to 64bit double when storing to memory.

32bit clang still uses SSE math by default, so it gets identical results to 64bit clang or 64bit g++. Telling g++ to do the same solves the problem. -mfpmath=sse tells it to do calculations with SSE (although it doesn't change the ABI, so floating point return values are still in x87 st(0).) -msse2 tells g++ to assume the target machine supports SSE and SSE2. ( added double-precision to 's single-precision. SSE2 is baseline in the x86-64 architecture, and used to pass/return FP args in the 64bit ABI.)

Without SSE, you could (but don't) use -ffloat-store to precisely follow the C standard and round intermediate results to 32 or 64bits by storing and re-loading them. This adds about 6 cycles of latency to every FP math instruction. (Compared to 3 cycle FP add, 5 cycle FP mul on Intel Haswell.) So don't do this, you'll get horrible code.


debugging steps: I tried it out on Ubuntu 15.10, with g++ 5.2, clang-3.5, and clang-3.8 (from http://llvm.org/apt/).

for i in ./boost-random-seedint*; do echo -ne "$i:\t" ; $i|md5sum ;done
./boost-random-seedint-g++32:           53d99523ca2afeac428eae2c89e69974  -
./boost-random-seedint-g++64:           a59f08c0bc22b8753c474db077b809bd  -
./boost-random-seedint-clang3.5-32:     a59f08c0bc22b8753c474db077b809bd  -
./boost-random-seedint-clang3.5-64:     a59f08c0bc22b8753c474db077b809bd  -
./boost-random-seedint-clang3.8-32:     a59f08c0bc22b8753c474db077b809bd  -
./boost-random-seedint-clang3.8-64:     a59f08c0bc22b8753c474db077b809bd  -

So the only outlier is 32bit g++. All the other outputs have the same hash

Compiler options:

clang++-3.8 -m32 -O1 -g boost-random-seedint.cpp -o boost-random-seedint-clang3.8-32  # and similiar
g++ -m32 -Og -g boost-random-seedint.cpp -o boost-random-seedint32

clang doesn't have a -Og. 32bit g++ with -O0 and -O3 make binaries that give the same output as the one from -Og.


Debugging the 32 and 64bit binaries: their state arrays are identical after the default seed and after the call to gen.seed(4294653137UL).

Hmm, I wonder if this is a -ffloat-store issue, with x87 float math keeping 80bit precision for intermediate results.

这篇关于为提高::随机:: uniform_real_distribution应该是相同的跨处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆