使PRNG在软件之间达成共识 [英] Make PRNGs Agree Across Software

查看:106
本文介绍了使PRNG在软件之间达成共识的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在调查是否可以使两组软件在产生的伪随机数序列上达成一致.我对了解所有可能的分歧点很感兴趣,就像我实际上在寻找一种使他们达成共识的办法一样.

I am investigating whether it is possible to have two sets of software agree on a sequence of produced pseudo-random numbers. I am as interested in understanding all the possible points of divergence as I am in actually finding a way to get them to agree.

为什么??我在一个使用许多不同软件包(Stata,R,Python,SAS,可能还有其他软件包)的数据商店中工作.通过以另一种语言复制过程,最近对QCing输出产生了兴趣.对于任何涉及随机数的过程,如果我们可以提供允许两个软件包达成一致的一系列步骤(设置此选项"等),将会很有帮助.如果那不可行,我希望能够阐明故障点在哪里.

Why? I work in a data shop that uses many different software packages (Stata, R, Python, SAS, probably others). There has recently been interest in QCing outputs by replicating processes in another language. For any process that involves random numbers, it would be helpful if we could provide a series of steps ("set this option", etc.) that allow the two packages to agree. If that's not feasible, I'd like to be able to articulate where are the failure points.

一个简单的示例:

R和Python的默认随机数生成器均为Mersenne-Twister.我将它们设置为相同的种子,并尝试从PRNG的状态"中进行采样.两个值都不相同.

Both R and Python's default random number generator is Mersenne-Twister. I set them to the same seed and try to sample from and also look at the "state" of the PRNG. Neither value agrees.

R(3.2.3,64位):

R (3.2.3, 64-bit):

set.seed(20160201)
.Random.seed
sample(c(1, 2, 3, 4, 5))

Python(3.5.1,64位):

Python (3.5.1, 64-bit):

import random

random.seed(20160201)
random.getstate()
random.sample([1, 2, 3, 4, 5], 5)

推荐答案

旧问题,但可能对某些将来的读者有用:正如评论中提到的那样,最好的选择是自己实现并为不同环境提供接口这样,对于给定的种子,将返回相同的结果.为什么那是必要的?您以采样"为例.涉及几个步骤.

Old question, but maybe useful to some future reader: As alluded in the comments, your best bet is to implement this your self and provide interfaces for the different environments such that for a given seed the same results are returned. Why is that necessary? You used "sampling" as an example. There are several steps involved.

  1. 播种是不平凡的过程.例如,R可以达到进一步加扰提供的种子.因此,除非用户工具使用相同的方法,否则即使用户提供相同的值,它们的结局也会不同.

  1. Seeding is a non-trivial process. For example R goes as far as to further scramble the provided seed. So unless you user tools use the same method, they will end up with a different seed even when the user supplies the same value.

实际的RNG:即使在两种情况下都可以使用Mersenne-Twister,它是否真的使用了相同的版本? R使用32位MT.也许Python使用的是64位版本?

The actual RNG: Even though in both cases Mersenne-Twister might be used, is it really the same version that is used? R uses a 32bit MT. Maybe Python uses a 64bit version?

大多数RNG都会为您提供一个无符号整数(如今通常为32位或64位).但是您将需要分配一些随机数,例如为了进行采样,您需要在给定范围内的随机整数.有许多方法可以将RNG生成的整数转换为采样所需的那些.对于R,您甚至无法访问RNG的输出值.最基本的函数是R_unif,该函数在[0,1)中返回一个double.同样,未得到普遍同意的方式.而且,如果您需要其他分布函数(正态,指数等),则会发现很多不同的算法.

Most RNGs give you an unsigned integer (nowadays typically 32 or 64bits). But you will need some distribution of random numbers, e.g. for sampling you need random integers within a given range. There are many methods to go from the integers produced by the RNG to those needed for sampling. In the case of R, you do not even have access to the output value of the RNG. The most fundamental function is R_unif which returns a double in [0, 1). Again, how to generate such a double is not universally agreed on. And if you need other distribution functions (normal, exponential, ...) you will find quite a few different algorithms for them.

总体上,在许多地方(细微)差异可能会蔓延.

Overall there are to many places where (subtle) differences can creep in.

这篇关于使PRNG在软件之间达成共识的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆