跨不同PBS工作的随机种子 [英] Random seed across different PBS jobs

查看:130
本文介绍了跨不同PBS工作的随机种子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Matlab中创建随机数,这在多个PBS作业中会有所不同(我正在使用作业数组).每个Matlab作业都使用一个并行的parfor循环,在该循环中生成随机数,如下所示:

I am trying to create random numbers in Matlab which will be different across multiple PBS jobs (I am using a job array). Each Matlab job uses a parallel parfor loop in which random numbers are generated, something like this:

parfor k = 1:10      
  tmp = randi(100, [1 200]);
end

但是,当我绘制结果时,我发现来自不同工作的结果不是完全随机的-我无法量化它,例如说数字完全相同,因为我的结果是随机数的函数,但是绘制时毫无疑问. 我试图使用进程ID和/或时钟来初始化每个作业中的随机种子:

However when I plot my result, I see that the results from different jobs are not completely random - I cannot quantify it, e.g by saying the numbers are exactly the same, since my results are a function of the random numbers, but it is unmistakeable when plotting it. I tried to initialize the random seed in each job, using the process id and/or the clock:

rngSeed = feature('getpid'); % OR: rngSeed = RandStream.shuffleSeed;
rng(rngSeed);

但这并不能解决问题.在使用shuffleSeed(基于时钟)之前,我还尝试在每个作业中暂停不同的秒数.

But this didn't solve the problem. I also tried to pause for a different number of seconds in each job, before using the shuffleSeed (which is clock based).

所有这些使我认为parfor在某种程度上搞乱了随机种子-如果parfor需要确保您在parfor的不同迭代中获得不同的随机数,这是有道理的.

All this made me think the parfor is somehow messing with the random seed - and it makes sense, if the parfor needs to make sure you get different random numbers across different iterations of the parfor.

我的问题是,真的是这样吗,如何解决这个问题并在不同的PBS工作中获得随机性?

My questions are, is it really the case, and how can I solve it and get randomness across different PBS jobs?

EDIT 运行4个作业,每个作业都使用parfor并有2个工作程序,我验证了尽管每个作业都有其自己的种子(在parfor外部设置),但生成的数字在各个作业中是相同的(而不是在每次迭代中均相同) parfor的-由Matlab处理).

EDIT running 4 jobs, each using parfor with 2 workers, I verified that although each job has it's own seed (set outside the parfor), the numbers generated are identical across jobs (not across iterations of the parfor - that is handled by Matlab).

编辑2 尝试使用@Sam Roberts的建议,我使用以下代码:

EDIT 2 Trying what was suggested by @Sam Roberts, I use the following code:

matlabpool open local 2
st = RandStream('mlfg6331_64');
RandStream.setGlobalStream(st);
rng('shuffle');

parfor n = 1:4       
  x=randi(100,[1 10]);
  fprintf('%d ',x(:)');
  fprintf('\n')
end
matlabpool close

但是在上述脚本的不同调用中我仍然得到相同的数字.

but I still get the same numbers on different calls to the above script.

推荐答案

您可能希望研究使用随机子流,以确保并行运行时正确的随机性和可再现性.

You may want to look into using random substreams, for correct randomness and reproducibility when running in parallel.

RandStream类允许您创建一个伪随机数流-从该流中提取的数字具有您希望获得的属性(独立性等),并且,如果您控制种子,则还具有可复制性.

The RandStream class allows you to create a pseudorandom number stream - numbers drawn from this stream have the properties you'd hope for (independence etc) and, if you control the seed, you also have reproducibility.

但是,例如从流中提取的每第二个或第四个数字可能具有相同的属性,情况可能并非如此.另外,使用parfor时,您无法控制循环迭代的运行顺序,这意味着您将失去可重复性.您可以在parfor循环中的每个工作线程上使用不同的子流.

But it may not be the case that, for example, every second or every fourth number drawn from the stream has the same properties. In addition, when you use parfor you have no control over the order in which the loop iterations are run, which means that you will lose reproducibility. You can use a different substream on each worker within a parfor loop.

某些RNG,例如mlfg6331_64(一个乘法滞后斐波那契生成器),或mrg32k3a(一个组合的多个递归生成器),支持 substreams -由同一RNG生成的独立流,但是保留相同的伪随机属性,并且可以单独选择,以保持可重复性.此外,许多MATLAB和Toolbox函数都有一个'UseParallel''UseSubstreams'选项,它们会告诉他们自动为您完成此操作.

Some RNGs, for example mlfg6331_64, a multiplicative lagged Fibonacci generator, or mrg32k3a, a combined multiple recursive generator, support substreams - independent streams that are generated by the same RNG, but which retain the same pseudorandom properties and can be selected from separately, retaining reproducibility. In addition, many MATLAB and Toolbox functions have an option 'UseParallel' and 'UseSubstreams', which will tell them to do this stuff for you automatically.

尽管以上内容是在MATLAB文档中的技术级别上记录的,但是很难找到. Statistics Toolbox文档中还有更多说明性指南(如果您要求我,应将其确实移至MATLAB).您可以在线此处阅读.

Although the above is documented at a technical level within the MATLAB documentation, it's kind of hard to find. There's a much more explanatory guide within Statistics Toolbox documentation (should really be moved to MATLAB if you ask me). You can read it online here.

希望有帮助!

这篇关于跨不同PBS工作的随机种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆