这是对并行代码正确使用 numpy 种子吗? [英] Is this proper use of numpy seeding for parallel code?

查看:25
本文介绍了这是对并行代码正确使用 numpy 种子吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在并行运行相同代码的 n 个实例,并且希望每个实例都使用独立的随机数.

I am running n instances of the same code in parallel and want each instance to use independent random numbers.

为此,在开始并行计算之前我创建了一个随机状态列表,如下所示:

For this purpose, before I start the parallel computations I create a list of random states, like this:

import numpy.random as rand
rand_states = [(rand.seed(rand.randint(2**32-1)),rand.get_state())[1] for j in range(n)]

然后我将rand_states的一个元素传递给每个并行进程,我基本上都是这样做的

I then pass one element of rand_states to each parallel process, in which I basically do

rand.set_state(rand_state)
data = rand.rand(10,10)

为了使事情可重现,我在一切开始时运行 np.random.seed(0) .

To make things reproducible, I run np.random.seed(0) at the very beginning of everything.

这是否像我希望的那样工作?这是实现它的正确方法吗?

(我不能只是预先存储数据数组本身,因为(i)在并行进程中有很多地方会生成随机数,以及(ii)这会在并行代码和管理之间引入不必要的逻辑耦合非并行代码和 (iii) 实际上我在 N 个处理器上运行 M 个切片,并且所有 M 个切片的数据太大而无法存储)

(I cannot just store the data arrays themselves beforehand, because (i) there are a lot of places where random numbers are generated in the parallel processes and (ii) that would introduce unnecessary logic coupling between the parallel code and the managing nonparallel code and (iii) in reality I run M slices across N<M processors and the data for all M slices is too big to store)

推荐答案

numpy.random.get_state 为 NumPy 生成器的 global 实例设置状态.但是,每个并行进程都应该使用自己的 PRNG 实例.NumPy 1.17 及更高版本为此提供了一个 numpy.random.Generator 类.(实际上,numpy.random.get_state 和其他 numpy.random.* 函数现在是自 NumPy 1.17 以来的遗留函数.NumPy 的新 RNG 系统是 建议更改 RNG 政策.)

numpy.random.get_state sets the state for the global instance of the NumPy generator. However, each parallel process should use its own instance of a PRNG instead. NumPy 1.17 and later provides a numpy.random.Generator class for this purpose. (In fact, numpy.random.get_state and other numpy.random.* functions are now legacy functions since NumPy 1.17. NumPy's new RNG system was the result of a proposal to change the RNG policy.)

播种多个进程的一种极好方法是利用所谓的基于计数器"的方法.PRNG (Salmon et al.,Parallel Random Numbers: As Easy as 1, 2, 3", 2011)和其他给每个种子它自己的非重叠流"的 PRNG的随机数.一个例子是位生成器 numpy.random.SFC64,在 NumPy 1.17 中新增.

An excellent way to seed multiple processes is to make use of so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011) and other PRNGs that give each seed its own non-overlapping "stream" of random numbers. An example is the bit generator numpy.random.SFC64, newly added in NumPy 1.17.

还有其他几种策略可以为多个进程播种,但几乎所有这些策略都涉及让每个进程使用自己的 PRNG 实例,而不是共享全局 PRNG 实例(与遗留的 numpy.random.* 函数,例如 numpy.random.seed).这些策略在我的部分播种多个进程"中进行了解释,这不是NumPy 特定的,以及页面并行随机数生成"在 NumPy 文档中.

There are several other strategies for seeding multiple processes, but almost all of them involve having each process use its own PRNG instance rather than sharing a global PRNG instance (as with the legacy numpy.random.* functions such as numpy.random.seed). These strategies are explained in my section "Seeding Multiple Processes", which is not NumPy-specific, and the page "Parallel Random Number Generation" in the NumPy documentation.

这篇关于这是对并行代码正确使用 numpy 种子吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆