在并行程序中播种随机数生成器 [英] Seeding random number generators in parallel programs

查看:94
本文介绍了在并行程序中播种随机数生成器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究Python的多处理模块. 我有两种情况:

I am studing the multiprocessing module of Python. I have two cases:

例如1

def Foo(nbr_iter):
    for step in xrange(int(nbr_iter)) :
        print random.uniform(0,1)
...

from multiprocessing import Pool

if __name__ == "__main__":
    ...
    pool = Pool(processes=nmr_parallel_block)
    pool.map(Foo, nbr_trial_per_process)

Ex2.(使用numpy)

Ex 2. (using numpy)

 def Foo_np(nbr_iter):
     np.random.seed()
     print np.random.uniform(0,1,nbr_iter)

在这两种情况下,随机数生成器都会植入其分叉过程中.

In both cases the random number generators are seeded in their forked processes.

为什么我必须在numpy示例中显式地进行播种,而不是在Python示例中?

Why do I have to do the seeding explicitly in the numpy example, but not in the Python example?

推荐答案

如果未明确提供种子,则numpy.random将使用依赖于OS的随机性源为其自身播种.通常,它会在基于Unix的系统(或某些Windows等效系统)上使用/dev/urandom,但是如果由于某种原因无法使用它,它将从挂钟中获得种子.由于自发发生在新的子流程分叉时,因此,如果多个子流程同时分叉,则多个子流程有可能继承相同的种子,从而导致不同子流程产生相同的随机变量.

If no seed is provided explicitly, numpy.random will seed itself using an OS-dependent source of randomness. Usually it will use /dev/urandom on Unix-based systems (or some Windows equivalent), but if this is not available for some reason then it will seed itself from the wall clock. Since self-seeding occurs at the time when a new subprocess forks, it is possible for multiple subprocesses to inherit the same seed if they forked at the same time, leading to identical random variates being produced by different subprocesses.

通常这与您正在运行的并发线程数相关.例如:

Often this correlates with the number of concurrent threads you are running. For example:

import numpy as np
import random
from multiprocessing import Pool

def Foo_np(seed=None):
    # np.random.seed(seed)
    return np.random.uniform(0, 1, 5)

pool = Pool(processes=8)
print np.array(pool.map(Foo_np, xrange(20)))

# [[ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.28917586  0.40997875  0.06308188  0.71512199  0.47386047]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796]
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]
#  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <-
#  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]]

您可以看到多达8个线程的组同时用相同的种子分叉,给了我相同的随机序列(我已经用箭头标记了第一组).

You can see that groups of up to 8 threads simultaneously forked with the same seed, giving me identical random sequences (I've marked the first group with arrows).

在子进程中调用np.random.seed()会强制线程本地RNG实例再次从/dev/urandom或挂钟开始播种自身,这(可能)将阻止您看到多个子进程的相同输出.最佳做法是将不同的种子(或numpy.random.RandomState实例)显式传递给每个子进程,例如:

Calling np.random.seed() within a subprocess forces the thread-local RNG instance to seed itself again from /dev/urandom or the wall clock, which will (probably) prevent you from seeing identical output from multiple subprocesses. Best practice is to explicitly pass a different seed (or numpy.random.RandomState instance) to each subprocess, e.g.:

def Foo_np(seed=None):
    local_state = np.random.RandomState(seed)
    print local_state.uniform(0, 1, 5)

pool.map(Foo_np, range(20))

我不确定在这方面randomnumpy.random之间差异的根本原因(也许与numpy.random相比,选择自种随机源的规则略有不同?).我仍然建议显式地将种子或random.Random实例传递给每个子进程,以确保安全.您还可以使用random.Random .jumpahead() 方法专用于在多线程程序中对Random实例的状态进行改组.

I'm not entirely sure what underlies the differences between random and numpy.random in this respect (perhaps it has slightly different rules for selecting a source of randomness to self-seed with compared to numpy.random?). I would still recommend explicitly passing a seed or a random.Random instance to each subprocess to be on the safe side. You could also use the .jumpahead() method of random.Random which is designed for shuffling the states of Random instances in multithreaded programs.

这篇关于在并行程序中播种随机数生成器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆