在同一程序中植入random和numpy.random的最佳实践 [英] Best practices for seeding random and numpy.random in the same program
问题描述
为了进行随机模拟,我们以后可重复运行,我和我的同事们经常使用random.seed
和np.random.seed
方法显式地为random
或numpy.random
模块的随机数生成器植入种子.如果我们仅在程序中使用这些模块中的一个 ,则可以使用任意常量(例如42)进行播种,但是有时,我们在同一程序中同时使用random
和np.random
.对于如何将两个RNG一起植入种子,我不确定是否应该遵循最佳实践.
In order to make random simulations we run reproducible later, my colleagues and I often explicitly seed the random
or numpy.random
modules' random number generators using the random.seed
and np.random.seed
methods. Seeding with an arbitrary constant like 42 is fine if we're just using one of those modules in a program, but sometimes, we use both random
and np.random
in the same program. I'm unsure whether there are any best practices I should be following about how to seed the two RNGs together.
尤其是,我担心存在某种陷阱,我们可能会陷入两个RNG一起以非随机"方式表现的陷阱,例如
In particular, I'm worried that there's some sort of trap we could fall into where the two RNGs together behave in a "non-random" way, such as both generating the exact same sequence of random numbers, or one sequence trailing the other by a few values (e.g. the kth number from random
is always the k+20th number from np.random
), or the two sequences being related to each other in some other mathematical way. (I realise that pseudo-random number generators are all imperfect simulations of true randomness, but I want to avoid exacerbating this with poor seed choices.)
考虑到这一目标,我们是否应该或者不应该为这两种RNG注入种子?我曾经或曾经见过同事使用一些不同的策略,例如:
With this objective in mind, are there any particular ways we should or shouldn't seed the two RNGs? I've used, or seen colleagues use, a few different tactics, like:
-
使用相同的任意种子:
Using the same arbitrary seed:
random.seed(42)
np.random.seed(42)
使用两个不同的任意种子:
Using two different arbitrary seeds:
random.seed(271828)
np.random.seed(314159)
使用一个RNG中的随机数来播种另一个:
Using a random number from one RNG to seed the other:
random.seed(42)
np.random.seed(random.randint(0, 2**32))
...而且我从来没有注意到这些方法中的任何一种奇怪的结果...但是也许我只是想念它们.有官方的方法吗?在代码审查中,我是否可以发现任何可能的陷阱并发出警报?
... and I've never noticed any strange outcomes from any of these approaches... but maybe I've just missed them. Are there any officially blessed approaches to this? And are there any possible traps that I can spot and raise the alarm about in code review?
推荐答案
我将讨论一些有关如何植入多个伪随机数生成器(PRNG)的准则.我假设您不是出于信息安全目的使用随机数(如果您这样做的话,则只适合使用加密RNG,并且此建议不适用).
I will discuss some guidelines on how multiple pseudorandom number generators (PRNGs) should be seeded. I assume you're not using random numbers for information security purposes (if you are, only a cryptographic RNG is appropriate and this advice doesn't apply).
- 为降低相关随机数的风险,可以使用PRNG算法,例如SFC和其他所谓的基于计数器"的PRNG(Salmon等人,并行随机数:简单到1,2, 3,2011),支持随机数的独立流".还有其他策略,我将在"播种多个进程中对此进行详细说明. .
- 如果可以使用NumPy 1.17,请注意该版本引入了新的PRNG系统,并在其PRNG清单中添加了SFC(
SFC64
).有关特定于NumPy的并行随机生成建议,请参阅"并行随机数生成". - 您应该避免为PRNG(尤其是一次多个)植入时间戳.
- 当我开始编写此答案时,您在评论中提到了此问题.此处的建议不要播种同一种PRNG的多个实例.但是,如果选择不相互关联的种子,或者状态非常大的PRNG(例如Mersenne Twister)或给每个种子提供自己的不重叠随机数的PRNG,则此建议的适用范围不大使用顺序(例如SFC).此处已接受的答案(在撰写本文时)演示了使用具有顺序种子的.NET
System.Random
的多个实例时发生的情况,但不一定是不同设计的PRNG,多个设计的PRNG或PRNG用不相关的种子初始化.此外,.NET的System.Random
对于PRNG来说不是一个不错的选择,因为它只允许种子长度不超过32位(因此它可以产生的随机序列数是有限的),并且还因为它具有实现错误(如果我正确理解),以保持向后兼容性.
- To reduce the risk of correlated random numbers, you can use PRNG algorithms, such as SFC and other so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011), that support independent "streams" of random numbers. There are other strategies as well, and I explain more about this in "Seeding Multiple Processes".
- If you can use NumPy 1.17, note that that version introduced a new PRNG system and added SFC (
SFC64
) to its repertoire of PRNGs. For NumPy-specific advice on parallel random generation, see "Parallel Random Number Generation" in the NumPy documentation. - You should avoid seeding PRNGs (especially several at once) with timestamps.
- You mentioned this question in a comment, when I started writing this answer. The advice there is not to seed multiple instances of the same kind of PRNG. This advice, however, doesn't apply as much if the seeds are chosen to be unrelated to each other, or if a PRNG with a very big state (such as Mersenne Twister) or a PRNG that gives each seed its own nonoverlapping random number sequence (such as SFC) is used. The accepted answer there (at the time of this writing) demonstrates what happens when multiple instances of .NET's
System.Random
, with sequential seeds, are used, but not necessarily what happens with PRNGs of a different design, PRNGs of multiple designs, or PRNGs initialized with unrelated seeds. Moreover, .NET'sSystem.Random
is a poor choice for a PRNG precisely because it allows only seeds no more than 32 bits long (so the number of random sequences it can produce is limited), and also because it has implementation bugs (if I understand correctly) that have been preserved for backward compatibility.
这篇关于在同一程序中植入random和numpy.random的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!