是否为并行代码正确使用了numpy种子? [英] Is this proper use of numpy seeding for parallel code?

查看:108
本文介绍了是否为并行代码正确使用了numpy种子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在并行运行n个相同代码的实例,并希望每个实例使用独立的随机数.

I am running n instances of the same code in parallel and want each instance to use independent random numbers.

为此,在开始并行计算之前创建一个随机状态列表,如下所示:

For this purpose, before I start the parallel computations I create a list of random states, like this:

import numpy.random as rand
rand_states = [(rand.seed(rand.randint(2**32-1)),rand.get_state())[1] for j in range(n)]

然后我将rand_states的一个元素传递给每个并行过程,基本上我都会在其中进行

I then pass one element of rand_states to each parallel process, in which I basically do

rand.set_state(rand_state)
data = rand.rand(10,10)

为了使事情可重现,我在所有内容的开头都运行了np.random.seed(0).

To make things reproducible, I run np.random.seed(0) at the very beginning of everything.

这项工作是否像我希望的那样?这是实现它的正确方法吗?

(我不能只是预先存储数据数组本身,因为(i)在并行处理中很多地方会生成随机数,并且(ii)会在并行代码和管理之间引入不必要的逻辑耦合非并行代码,并且(iii)实际上我在N<M处理器上运行M切片,并且所有M切片的数据太大而无法存储)

(I cannot just store the data arrays themselves beforehand, because (i) there are a lot of places where random numbers are generated in the parallel processes and (ii) that would introduce unnecessary logic coupling between the parallel code and the managing nonparallel code and (iii) in reality I run M slices across N<M processors and the data for all M slices is too big to store)

推荐答案

numpy.random.get_state设置NumPy生成器的 global 实例的状态.但是,每个并行进程都应使用其自己的PRNG实例. NumPy 1.17和更高版本为此提供了一个numpy.random.Generator类. (实际上,自NumPy 1.17开始,numpy.random.get_state和其他numpy.random.*函数现在已成为旧函数.NumPy的新RNG系统是

numpy.random.get_state sets the state for the global instance of the NumPy generator. However, each parallel process should use its own instance of a PRNG instead. NumPy 1.17 and later provides a numpy.random.Generator class for this purpose. (In fact, numpy.random.get_state and other numpy.random.* functions are now legacy functions since NumPy 1.17. NumPy's new RNG system was the result of a proposal to change the RNG policy.)

播种多个过程的一种极好的方法是利用所谓的基于计数器"的PRNG(Salmon等人,并行随机数:像1,2,3一样容易",2011)和其他PRNG赋予每个种子自己的不重叠的随机数流".一个示例是位生成器 numpy.random.SFC64,它是NumPy 1.17中新添加的.

An excellent way to seed multiple processes is to make use of so-called "counter-based" PRNGs (Salmon et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011) and other PRNGs that give each seed its own non-overlapping "stream" of random numbers. An example is the bit generator numpy.random.SFC64, newly added in NumPy 1.17.

还有其他几种策略可以为多个进程提供种子,但是几乎所有策略都涉及使每个进程使用其自己的PRNG实例,而不是共享全局PRNG实例(与旧版numpy.random.*函数一样).这些策略在我的"播种多个进程"部分中进行了解释,该部分不是NumPy-特定的页面,其中的并行随机数生成 NumPy文档.

There are several other strategies for seeding multiple processes, but almost all of them involve having each process use its own PRNG instance rather than sharing a global PRNG instance (as with the legacy numpy.random.* functions). These strategies are explained in my section "Seeding Multiple Processes", which is not NumPy-specific, and the page "Parallel Random Number Generation" in the NumPy documentation.

这篇关于是否为并行代码正确使用了numpy种子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆