如何与子进程共享父进程的 numpy 随机状态? [英] How to share numpy random state of a parent process with child processes?

查看:22
本文介绍了如何与子进程共享父进程的 numpy 随机状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在程序开始时设置了 numpy 随机种子.在程序执行期间,我使用 multiprocessing.Process 多次运行一个函数.该函数使用 numpy 随机函数来绘制随机数.问题在于 Process 获取当前环境的副本.因此,每个进程都是独立运行的,它们都以与父环境相同的随机种子开始.

I set numpy random seed at the beginning of my program. During the program execution I run a function multiple times using multiprocessing.Process. The function uses numpy random functions to draw random numbers. The problem is that Process gets a copy of the current environment. Therefore, each process is running independently and they all start with the same random seed as the parent environment.

所以我的问题是如何与子进程环境共享父环境中numpy的随机状态?请注意,我想将 Process 用于我的工作,并且需要使用 单独的类 并在该类中单独执行 import numpy .我尝试使用 multiprocessing.Manager 来共享随机状态,但似乎事情没有按预期工作,我总是得到相同的结果.此外,我是否将 for 循环移动到 drawNumpySamples 中或将其留在 main.py 中都没有关系;我仍然无法获得不同的数字,并且随机状态始终相同.这是我的代码的简化版本:

So my question is how can I share the random state of numpy in the parent environment with the child process environment? Just note that I want to use Process for my work and need to use a separate class and do import numpy in that class separately. I tried using multiprocessing.Manager to share the random state but it seems that things do not work as expected and I always get the same results. Also, it does not matter if I move the for loop inside drawNumpySamples or leave it in main.py; I still cannot get different numbers and the random state is always the same. Here's a simplified version of my code:

# randomClass.py
import numpy as np
class myClass(self):
    def __init__(self, randomSt):
        print ('setup the object')
        np.random.set_state(randomSt)
    def drawNumpySamples(self, idx)
        np.random.uniform()

在主文件中:

    # main.py
    import numpy as np
    from multiprocessing import Process, Manager
    from randomClass import myClass

    np.random.seed(1) # set random seed
    mng = Manager()
    randomState = mng.list(np.random.get_state())
    myC = myClass(randomSt = randomState)

    for i in range(10):
        myC.drawNumpySamples() # this will always return the same results

注意:我使用 Python 3.5.我还在 Numpy 的 GitHub 页面上发布了一个问题.只需在此处发送问题链接以供将来参考.

Note: I use Python 3.5. I also posted an issue on Numpy's GitHub page. Just sending the issue link here for future reference.

推荐答案

即使你设法让这个工作,我认为它不会做你想要的.一旦你有多个进程从同一个随机状态并行提取,它们每个到达状态的顺序就不再是确定的,这意味着你的运行实际上是不可重复的.可能有办法解决这个问题,但这似乎是一个不平凡的问题.

Even if you manage to get this working, I don’t think it will do what you want. As soon as you have multiple processes pulling from the same random state in parallel, it’s no longer deterministic which order they each get to the state, meaning your runs won’t actually be repeatable. There are probably ways around that, but it seems like a nontrivial problem.

同时,有一个解决方案可以同时解决您想要的问题和不确定性问题:

Meanwhile, there is a solution that should solve both the problem you want and the nondeterminism problem:

在生成子进程之前,向 RNG 请求一个随机数,并将其传递给子进程.然后孩子可以用那个号码播种.然后,每个孩子将拥有与其他孩子不同的随机序列,但如果您使用固定种子重新运行整个应用程序,则同一个孩子将获得相同的随机序列.

Before spawning a child process, ask the RNG for a random number, and pass it to the child. The child can then seed with that number. Each child will then have a different random sequence from other children, but the same random sequence that the same child got if you rerun the entire app with a fixed seed.

如果您的主进程执行任何其他 RNG 工作,这些工作可能不确定地取决于子进程的执行,则您需要按顺序为所有子进程预先生成种子,然后再提取任何其他随机数字.

If your main process does any other RNG work that could depend non-deterministically on the execution of the children, you'll need to pre-generate the seeds for all of your child processes, in order, before pulling any other random numbers.

正如 senderle 在评论中指出的那样:如果您不需要多个不同的运行,而只需要一个固定的运行,您甚至不需要从您的种子 RNG 中提取种子;只需使用一个从 1 开始的计数器并为每个新进程增加它,并将其用作种子.我不知道这是否可以接受,但如果可以,就很难比这更简单了.

As senderle pointed out in a comment: If you don't need multiple distinct runs, but just one fixed run, you don't even really need to pull a seed from your seeded RNG; just use a counter starting at 1 and increment it for each new process, and use that as a seed. I don't know if that's acceptable, but if it is, it's hard to get simpler than that.

正如 Amir 在评论中指出的那样:更好的方法是在每次生成新进程时绘制一个随机整数,并将该随机整数传递给新进程以使用该整数设置 numpy 的随机种子.这个整数确实可以来自 np.random.randint().

As Amir pointed out in a comment: a better way is to draw a random integer every time you spawn a new process and pass that random integer to the new process to set the numpy's random seed with that integer. This integer can indeed come from np.random.randint().

这篇关于如何与子进程共享父进程的 numpy 随机状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆