多个线程随机数 [英] Random numbers for multiple threads

查看:136
本文介绍了多个线程随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算以大约一百万伪随机数32位写Linux的C ++ 11个应用,做一些数值仿真(没有密码学)。为了加快速度,我想用台式机CPU的所有内核线程的并行执行模拟。我想用梅森倍捻机<一个href=\"http://www.boost.org/doc/libs/1_53_0/doc/html/boost/random/mt19937.html\"><$c$c>mt19937通过升压为PRNG提供的,我想,对于性能方面的原因,我应该有每个线程一个这样的PRNG。现在我不知道如何才能避免产生在多线程随机数的同一子种子它们。

I intend to write a C++11 application for Linux which does some numerical simulation (not cryptography) based on approximately one million pseudorandom 32bit numbers. To speed things up, I'd like to perform the simulation in parallel threads using all cores of a desktop CPU. I'd like to use the Mersenne Twister mt19937 provided by boost as the PRNG, and I guess that for performance reasons I should have one such PRNG per thread. Now I'm unsure about how to seed them in order to avoid generating the same subsequence of random numbers in multiple threads.

下面是我想到的迄今为止选择:

Here are the alternatives that I have thought of so far:


  1. 从独立种子PRNG为每个线程的/ dev / urandom的

我有点担心的情况下,当系统的熵池耗尽得到,因为我不知道系统内部PRNG如何运作的。它可能发生,我accidentially连续获得该种子准确识别梅森倍捻机的连续状态,由于这一事实,即的/ dev / urandom的使用的是梅森难题本身呢?可能密切相关的我的顾虑下一个点。

I'm a bit worried about the case when the system entropy pool gets exhausted, as I don't know how the system internal PRNG operates. Could it happen that I accidentially get consecutive seeds which exactly identify consecutive states of the Mersenne Twister, due to the fact that /dev/urandom is using a Mersenne Twister itself? Probably strongly related to my concerns for the next point.

种子一名来自的/ dev / urandom的 PRNG,并从第一个其他的。

Seed one PRNG from /dev/urandom and the others from that first one.

基本上同样的关注,以及:它是好还是坏使用一个PRNG种子另一个使用相同的算法?或者换句话说,不从 mt19937 直接对应于 mt19937 发生器的内部状态读625 32位整数这一代中的任何一点?

Basically the same concern as well: is it good or bad to use one PRNG to seed another that uses the same algorithm? Or in other words, does reading 625 32bit integers from a mt19937 correspond directly to the internal state of the mt19937 generator at any point during this generation?

种子他人先用非梅森信息。

Seed others from first with non-Mersenne information.

由于使用相同的算法来产生随机数并生成初始种子感觉莫名其妙地像它可能是一个坏主意,我想到了引入一些元素,它不依赖于梅森倍捻机算法。例如,我可以XOR线程ID到初始种子向量的每个元素。这是否使事情变得更好?

As using the same algorithm to generate random numbers and to generate the initial seed feels somehow like it might be a bad idea, I thought about introducing some element which is not dependent on the Mersenne Twister algorithm. For example, I could XOR the thread id into each element of the initial seed vector. Does that make things any better?

共享一个PRNG线程之间。

Share one PRNG among threads.

此将确保只有一个序列,梅森倍捻机的所有已知的和所希望的性能。但锁定开销需要控制访问该生成器让我担心一些。由于我没有发现任何证据,相反,我认为我作为图书馆用户将负责该PRNG preventing的并发访问。

This would make sure that there is only one sequence, with all the known and desirable properties of the Mersenne Twister. But the locking overhead required to control access to that generator does worry me somewhat. As I have found no evidence to the contrary, I assume that I as the library user would be responsible for preventing concurrent access to the PRNG.

pre-生成所有的随机数。

Pre-generate all random numbers.

此将有一个线程产生所有必需的1M随机数前面,要由不同线程以后使用。 4M的存储器要求将比起整体应用的是小的。我最担心这种方法是随机数本身的产生是不是并发。这整个方法也没有规模太清楚了。

This would have one thread generate all the required 1M random numbers up front, to be used by the different threads later on. The memory requirement of 4M would be small compared to that of the overall application. What worries me most in this approach is that the generation of random numbers itself is not concurrent. This whole approach also doesn't scale too well.

哪个这些方法,你会建议,为什么?或者你有不同的建议吗?

Questions

Which of these approaches would you suggest, and why? Or do you have a different suggestion?

你知道我的担心是有道理的,哪些是仅仅是由于我缺乏洞察事物是如何工作的?

Do you know which of my concerns are justified and which are simply due to my lack of insight into how things actually work?

推荐答案

我会用一个实例种子别人。我是pretty相信你能做到这一点安全很容易。

I'd use one instance to seed the others. I'm pretty sure you can do this safely fairly easily.


  • 即使在国家航天事业的微小变化相当大的变化下游 - 如果你能保证他们不具有完全相同的起始空间(并没有同样的状态preFIX),我就不会担心产生相同的数字。例如,仅使用值1,2,3种子三个线程就可以正常使用 - 你甚至都不需要种子的整个空间。另一个好处:通过明确predictable种子您可以轻松地抹黑,你樱桃采摘任何运行的想法(假设你想证明什么的)

  • 这是微不足道的,这意味着产生的孩子是高度不相关的方式播种。只是在重复一个广度优先的方式;也就是说,如果你想种子的N×623 int类型,不按顺序种子623的值,但挑头N和分发,那么随后的N等等。即使有播种机和儿童,之间的相关性有一定相关性不同的孩子应该是几乎不存在的 - 而这一切你关心

  • 我preFER一种算法,允许确定性执行只要有可能,这样的根据的上urandom设备不具吸引力。这使得调试更加容易。

  • 最后,显然 - 测试。这些PRNG是相当强劲,但通过各种手段眼球的结果,并通过做你模拟什么启发了一些相关的测试。大多数问题应该是显而易见的 - 无论你播种不好,有明显的重复序列,你你已经播种好,然后质量是由PRNG限制规定

  • 对于最后的执行,就大功告成了测试之后,就可以播种的第一个使用urandom的头脑为和/或线程ID的和平状态623值。

  • Even small changes in the state space cause fairly large changes downstream - if you can ensure they don't have exactly the same starting space (and no identical state prefix), I wouldn't worry about producing identical numbers. For instance, using just the values 1,2,3 to seed three threads would work fine - you don't even need to seed the entire space. Another advantage: by using clearly predictable seeds you can easily discredit the idea that you're cherry-picking any runs (assuming you're trying to demonstrate something).
  • It's trivial to seed in a way that means the resultant "children" are highly un-correlated. Just iterate in a breadth-first fashion; i.e. if you want to seed N x 623 int values, don't seed 623 values sequentially, but pick the first N and distribute, then the next N etc. Even if there's some correlation between the seeder and the children, the correlation between the various children should be virtually non-existant - and that's all you care about.
  • I'd prefer an algorithm that allows deterministic execution whenever possible, so depending on urandom is not attractive. This makes debugging easier.
  • Finally, and obviously - test. These PRNG are fairly robust, but by all means eyeball the results and do a few correlation tests inspired by what you're simulating. Most problems should be obvious - either you've seeded badly and there are obvious repeating subsequences, you you've seeded well, and then the quality is dictated by the PRNG limitations.
  • For final executions, after you're done testing, you can seed the first of 623 state values using urandom for peace of mind and/or the thread ID.

这篇关于多个线程随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆