在同一程序中植入random和numpy.random的最佳实践 [英] Best practices for seeding random and numpy.random in the same program

查看:202
本文介绍了在同一程序中植入random和numpy.random的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了进行随机模拟,我们以后可重复运行,我和我的同事们经常使用random.seednp.random.seed方法显式地为randomnumpy.random模块的随机数生成器植入种子.如果我们仅在程序中使用这些模块中的一个 ,则可以使用任意常量(例如42)进行播种,但是有时,我们在同一程序中同时使用randomnp.random.对于如何将两个RNG一起植入种子,我不确定是否应该遵循最佳实践.

In order to make random simulations we run reproducible later, my colleagues and I often explicitly seed the random or numpy.random modules' random number generators using the random.seed and np.random.seed methods. Seeding with an arbitrary constant like 42 is fine if we're just using one of those modules in a program, but sometimes, we use both random and np.random in the same program. I'm unsure whether there are any best practices I should be following about how to seed the two RNGs together.

尤其是,我担心存在某种陷阱,我们可能会陷入两个RNG一起以非随机"方式表现的陷阱,例如

In particular, I'm worried that there's some sort of trap we could fall into where the two RNGs together behave in a "non-random" way, such as both generating the exact same sequence of random numbers, or one sequence trailing the other by a few values (e.g. the kth number from random is always the k+20th number from np.random), or the two sequences being related to each other in some other mathematical way. (I realise that pseudo-random number generators are all imperfect simulations of true randomness, but I want to avoid exacerbating this with poor seed choices.)

考虑到这一目标,我们是否应该或者不应该为这两种RNG注入种子?我曾经或曾经见过同事使用一些不同的策略,例如:

With this objective in mind, are there any particular ways we should or shouldn't seed the two RNGs? I've used, or seen colleagues use, a few different tactics, like:

  • 使用相同的任意种子:

  • Using the same arbitrary seed:

random.seed(42)
np.random.seed(42)

  • 使用两个不同的任意种子:

  • Using two different arbitrary seeds:

    random.seed(271828)
    np.random.seed(314159)
    

  • 使用一个RNG中的随机数来播种另一个:

  • Using a random number from one RNG to seed the other:

    random.seed(42)
    np.random.seed(random.randint(0, 2**32))
    

  • ...而且我从来没有注意到这些方法中的任何一种奇怪的结果...但是也许我只是想念它们.有官方的方法吗?在代码审查中,我是否可以发现任何可能的陷阱并发出警报?

    ... and I've never noticed any strange outcomes from any of these approaches... but maybe I've just missed them. Are there any officially blessed approaches to this? And are there any possible traps that I can spot and raise the alarm about in code review?

    推荐答案

    我将讨论一些有关如何植入多个伪随机数生成器(PRNG)的准则.我假设您不是出于信息安全目的使用随机数(如果您这样做的话,则只适合使用加密RNG,并且此建议不适用).

    I will discuss some guidelines on how multiple pseudorandom number generators (PRNGs) should be seeded. I assume you're not using random numbers for information security purposes (if you are, only a cryptographic RNG is appropriate and this advice doesn't apply).

    • To reduce the risk of correlated random numbers, you can use PRNG algorithms, such as SFC and other so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011), that support independent "streams" of random numbers. There are other strategies as well, and I explain more about this in "Seeding Multiple Processes".
    • If you can use NumPy 1.17, note that that version introduced a new PRNG system and added SFC (SFC64) to its repertoire of PRNGs. For NumPy-specific advice on parallel random generation, see "Parallel Random Number Generation" in the NumPy documentation.
    • You should avoid seeding PRNGs (especially several at once) with timestamps.
    • You mentioned this question in a comment, when I started writing this answer. The advice there is not to seed multiple instances of the same kind of PRNG. This advice, however, doesn't apply as much if the seeds are chosen to be unrelated to each other, or if a PRNG with a very big state (such as Mersenne Twister) or a PRNG that gives each seed its own nonoverlapping random number sequence (such as SFC) is used. The accepted answer there (at the time of this writing) demonstrates what happens when multiple instances of .NET's System.Random, with sequential seeds, are used, but not necessarily what happens with PRNGs of a different design, PRNGs of multiple designs, or PRNGs initialized with unrelated seeds. Moreover, .NET's System.Random is a poor choice for a PRNG precisely because it allows only seeds no more than 32 bits long (so the number of random sequences it can produce is limited), and also because it has implementation bugs (if I understand correctly) that have been preserved for backward compatibility.

    这篇关于在同一程序中植入random和numpy.random的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆