RNG技术的可移植性和可重复性 [英] Portability and reproducibility of RNG techniques

查看:104
本文介绍了RNG技术的可移植性和可重复性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用两种方法之一来创建具有两个重要特征的伪随机数序列-(1)它可以在不同的机器上重现,并且(2)该序列永远不会在范围内重复任何数字,直到发出所有数字为止

I can use one of two methods to create a pseudo random number sequence that has two important characteristics - (1) it is reproducible on different machines, and (2) the sequence never repeats a number within range until all have been emitted.

我的问题是-这些方法中的任何一种是否都具有可移植性方面的潜在问题(操作系统,Python版本等)?例如,有人知道在XXX为true的情况下,我是否会在一个系统上得到一组结果,而在另一个系统上得到不同的结果?

仅当我在Z为true的情况下应注意在Y系统上使用X时,我才真正在寻求使用哪种方法的建议.

I'm not really asking for advice on which method to use per se, only if I should watch out for X on Y system when Z is true.

我已经尝试了几种Linux版本,全都是64位,而且看起来似乎是一致的,但是我无法轻松访问Windows或32位版本.

I have tried on a few versions of Linux, all 64bit, and they seem consistent, but I don't have easy access to Windows or 32 bit versions.

请注意,它们不会产生相同的范围,但这对我而言是可以的.这些数字只需要在人眼中随机出现即可.

Note that they don't produce the same ranges as each other, but that's OK for my purposes. The numbers just have to appear random to the human eye.

方法1
使用本机Python库函数从范围生成随机样本.如果我使用较大的范围(10m或更大),速度较慢,但​​对于较小的范围来说还可以,并且对于没有数学学位的人来说更容易理解:

Method 1
generates a random sample from a range using native Python library functions. It's slow if I use large ranges (10m or more) but OK for relatively small ones, and is much easier to understand for people without a degree in maths :

import random
random.seed(5)
x = random.sample(range(10000,99999),89999)
for i in range(10):
   print(x[i])

方法2
使用不是来自Python库的算法:
( https://en.wikipedia.org/wiki/Linear_congruential_generator )
即使在很大的范围内,它也非常快,但是很难理解,因此可以发现潜在的问题:

Method 2
uses an algorithm not from a Python library :
(https://en.wikipedia.org/wiki/Linear_congruential_generator)
It's super fast even for massive ranges, but harder to understand and therefore spot potential issues with :

def lcg(modulus, a, c, seed):
  while True:
    seed = (a * seed + c) % modulus
    yield seed


m = 10000019
c = int(m/2)
a = 5653
s = a

g = lcg(m,a,c,s)
for _ in range(10):
  print(next(g))

请注意,我对替代品持开放态度;最初的问题在这里被问到: https ://math.stackexchange.com/questions/3289084/generate-a-pseudo-random-predictable-non-repeating-integer-sequence-purely-math

Note I am more than open to alternatives; the original question was asked here : https://math.stackexchange.com/questions/3289084/generate-a-pseudo-random-predictable-non-repeating-integer-sequence-purely-math

推荐答案

大多数便携式版本IMO将是LCG,其周期等于计算机的自然字大小.它为模块使用寄存器溢出,从而使其更快.您必须使用NumPy数据类型来执行此操作,这是简单的代码,常量a,c取自表4

Most portable version, IMO, would be LCG with period equal to natural word size of the machine. It uses register overflow for module which makes it even faster. You have to use NumPy datatypes to do that, here is simple code, constants a, c are taken from Table 4 here

import numpy as np

def LCG(seed: np.uint64, a: np.uint64, c: np.uint64) -> np.uint64:
    with np.errstate(over='ignore'):
        while True:
            seed = (a * seed + c)
            yield seed

a = np.uint64(2862933555777941757)
c = np.uint64(1)

rng64 = LCG(np.uint64(17), a, c)

print(next(rng64))
print(next(rng64))
print(next(rng64))

Linux x64和Windows x64以及OS X VM的工作原理完全相同.

Both Linux x64 and Windows x64 as well as OS X VM works exactly the same.

关于可重复性,唯一的好处是存储第一对数字,并在应用程序初始化阶段将它们与LCG输出进行比较-如果可以,请继续进行操作.

Concerning reproducibility, the only good is to store first couple numbers and compare them with LCG output during app initialization stage - if they are ok, you proceed further.

我喜欢的LCG的另一个功能是它能够以log 2 (N)的时间向前跳,其中N是跳过的次数.我可以为您提供执行此操作的代码.使用跳转可以确保并行独立随机流的非重叠序列

Another feature of the LCG I like is it's ability to jump ahead in log2(N) time, where N is number of skips. I could provide you with code to do that. Using jump ahead you could ensure non-overlapping sequences for parallel independent random streams

更新

这是将我的C代码转换为Python/NumPy的方法,似乎可以正常工作.它可以在对数时间中向前和向后跳过.

Here is translation of my C code into Python/NumPy, seems to work. It could skip forward as well as backward in logarithmic time.

import numpy as np

class LCG(object):

    UZERO: np.uint64 = np.uint64(0)
    UONE : np.uint64 = np.uint64(1)

    def __init__(self, seed: np.uint64, a: np.uint64, c: np.uint64) -> None:
        self._seed: np.uint64 = np.uint64(seed)
        self._a   : np.uint64 = np.uint64(a)
        self._c   : np.uint64 = np.uint64(c)

    def next(self) -> np.uint64:
        self._seed = self._a * self._seed + self._c
        return self._seed

    def seed(self) -> np.uint64:
        return self._seed

    def set_seed(self, seed: np.uint64) -> np.uint64:
        self._seed = seed

    def skip(self, ns: np.int64) -> None:
        """
        Signed argument - skip forward as well as backward

        The algorithm here to determine the parameters used to skip ahead is
        described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
        Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
        O(log2(N)) operations instead of O(N). It computes parameters
        A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
        """

        nskip: np.uint64 = np.uint64(ns)

        a: np.uint64 = self._a
        c: np.uint64 = self._c

        a_next: np.uint64 = LCG.UONE
        c_next: np.uint64 = LCG.UZERO

        while nskip > LCG.UZERO:
            if (nskip & LCG.UONE) != LCG.UZERO:
                a_next = a_next * a
                c_next = c_next * a + c

            c = (a + LCG.UONE) * c
            a = a * a

            nskip = nskip >> LCG.UONE

        self._seed = a_next * self._seed + c_next    


np.seterr(over='ignore')

a = np.uint64(2862933555777941757)
c = np.uint64(1)
seed = np.uint64(1)

rng64 = LCG(seed, a, c) # initialization

print(rng64.next())
print(rng64.next())
print(rng64.next())

rng64.skip(-3) # back by 3
print(rng64.next())
print(rng64.next())
print(rng64.next())

rng64.skip(-3) # back by 3
rng64.skip(2) # forward by 2
print(rng64.next())

无论如何,LCG RNG的摘要:

Anyway, summary of the LCG RNG:

  1. 具有良好的常量(请参阅L'Ecuyer论文的参考文献),它将覆盖整个[0 ... 2 64 )范围,而无需重复自身.基本完美的[0 ... 2 64 )-> [0 ... 2 64 )映射,您可以设置 0,1,2,3,...作为输入并获得整个范围的输出
  2. 这是可逆的,您可以找回以前的种子,因此实际上是映射 bijection,[0 ... 2 64 )<-> [0 ... 2 64 ).有关详细信息,请参见可逆伪随机序列生成器
  3. 它具有对数向前和向后跳跃,因此没有问题可以找到 并行计算的合适时间间隔-从单个种子开始,然后下一个线程将被跳过(种子,2 64 /N),下一个线程将被跳过(种子,2 64 /N * 2),依此类推.保证不会重叠
  4. 虽然不是高质量的RNG,但操作简单,快捷
  1. With good constants (see reference to L'Ecuyer paper) it will cover whole [0...264) range without repeating itself. Basically perfect [0...264) -> [0...264) mapping, you could set 0,1,2,3,... as input and get whole range output
  2. It is reversible, you could get previous seed back so mapping is actually bijection, [0...264) <-> [0...264). See Reversible pseudo-random sequence generator for details
  3. It has logarithmic skip forward and backward, so there is no problem to find suitable intervals for parallel computation - start with single seed, and then next thread would be skip(seed, 264/N), next thread skip(seed, 264/N * 2) and so on and so forth. Guaranteed to not overlap
  4. It is simple and fast, though not a very high quality RNG

这篇关于RNG技术的可移植性和可重复性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆