为什么在并行运行时numpy随机种子不能保持固定,但是RandomState是? [英] Why is numpy random seed not remaining fixed but RandomState is when run in parallel?

查看:449
本文介绍了为什么在并行运行时numpy随机种子不能保持固定,但是RandomState是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用joblib并行运行蒙特卡洛模拟.我注意到,尽管我的种子固定了,但结果却不断变化.但是,当我按顺序运行该过程时,它保持了我期望的不变.

I am running a monte-carlo simulation in parallel using joblib. I noticed however although my seeds were fixed my results kept changing. However, when I ran the process in series it remained constant as I expect.

下面我实现一个小例子,模拟方差较大的正态分布的均值.

Below I implement a small example, simulating the mean for a normal distribution with higher variance.

加载库并定义函数

import numpy as np
from joblib import Parallel, delayed

def _estimate_mean():
    np.random.seed(0)
    x = np.random.normal(0, 2, size=100)
    return np.mean(x)

我第一个实现系列 的示例-结果与预期相同.

The first example I implement in series - the results are all the same as expected.

tst = [_estimate_mean() for i in range(8)]
In [28]: tst
Out[28]:
[0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897]

第二个示例我以并行方式实现 :(请注意,有时其他方法均相同)

The second example I implement in Parallel: (Note sometimes the means are all the same other times not)

tst = Parallel(n_jobs=-1, backend="threading")(delayed(_estimate_mean)() for i in range(8))

In [26]: tst
Out[26]:
[0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.1640259414956747,
 -0.11846452111932627,
 -0.3935934130918206]

我希望并行运行与固定种子相同.我发现如果我实施RandomState来修复种子,似乎可以解决问题:

I expect the parallel run to be the same as the seed is fixed. I found if I implement RandomState to fix the seeds it seems to resolve the problem:

def _estimate_mean():
    local_state = np.random.RandomState(0)
    x = local_state.normal(0, 2, size=100)
    return np.mean(x)
tst = Parallel(n_jobs=-1, backend="threading")(delayed(_estimate_mean)() for i in range(8))

In [28]: tst
Out[28]:
[0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897,
 0.11961603106897]

使用numpy.random固定种子时,使用RandomState和仅使用seed有什么区别?为什么在并行运行时后者不能可靠地工作?

What is the difference between using RandomState and just seed when fixing the seeds using numpy.random and why would the latter not reliably work when running in parallel ?

系统信息

操作系统:Windows 10

OS: Windows 10

Python:3.7.3(默认值,2019年4月24日,15:29:51)[MSC v.1915 64位(AMD64)]

Python: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]

脾气暴躁:1.17.2

Numpy: 1.17.2

推荐答案

使用numpy.random.*获得的结果是由于比赛条件而发生的. numpy.random.*仅使用一个全局PRNG,该全局PRNG在所有线程之间共享而不同步.由于线程同时并行运行,并且它们对全局PRNG的访问未在它们之间同步,因此它们都竞相访问PRNG状态(因此PRNG的状态可能会在其他线程的支持下改变).为每个线程提供自己的PRNG(RandomState)可解决此问题,因为不再有任何线程在不同步的情况下由多个线程共享的状态.

The result you're getting with numpy.random.* is happening because of a race condition. numpy.random.* uses only one global PRNG that is shared across all the threads without synchronization. Since the threads are running in parallel, at the same time, and their access to this global PRNG is not synchronized between them, they are all racing to access the PRNG state (so that the PRNG's state might change behind other threads' backs). Giving each thread its own PRNG (RandomState) solves this problem because there is no longer any state that's shared by multiple threads without synchronization.

由于您使用的是NumPy 1.17,因此您应该知道还有更好的选择:NumPy 1.17引入了

Since you're using NumPy 1.17, you should know that there is a better alternative: NumPy 1.17 introduces a new random number generation system; it uses so-called bit generators, such as PCG, and random generators, such as the new numpy.random.Generator.

这是建议更改RNG政策,其中指出通常不应再使用numpy.random.*功能.尤其是因为numpy.random.*在全局状态下运行.

It was the result of a proposal to change the RNG policy, which states that numpy.random.* functions should generally not be used anymore. This is especially because numpy.random.* operates on global state.

NumPy文档现在具有有关—

The NumPy documentation now has detailed information on—

  • seeding RNGs in parallel, and
  • multithreading RNGs,

在新的RNG系统中.另请参阅"非加密PRNG的种子生成",该文章来自我的一般建议在选择RNG上.

In the new RNG system. See also "Seed Generation for Noncryptographic PRNGs", from an article of mine with general advice on RNG selection.

这篇关于为什么在并行运行时numpy随机种子不能保持固定,但是RandomState是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆