时空相关性(采用System.Random.TF时没有present)采用System.Random时 [英] Temporal correlations when employing System.Random (not present when employing System.Random.TF)

查看:192
本文介绍了时空相关性(采用System.Random.TF时没有present)采用System.Random时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题涉及时间相关的起源可以观察到与 System.Random 当一个人产生从连续的种子连续偶合(其中一个丢弃发电机相同数量的每个种子)。

This question concerns the origins of temporal correlations one observes with System.Random when one generates successive randoms from successive seeds (where one discards the same number of generators for each seed).

使用mkStdGen从System.Random生成随机布尔回答1 和的Using mkStdGen从System.Random生成随机布尔答案2 有人建议(基于引用Theirin后裔的reddit的文章),其最初的几个发电机应以被丢弃得到有意义的结果。然而,我发现,不论多少发电机1丢弃的是,当我们考虑的分布的一个时间方面取得如果在连续的种子产生的连续的随机数(与一种丢弃发电机相同数量的每个种子)不希望的结果。

In Using mkStdGen from System.Random to generate random booleans Answer 1 and Using mkStdGen from System.Random to generate random booleans Answer 2 it was suggested (based on the reddit article referenced theirin) that the first few generators should be discarded in order to get sensible results. However I find that irrespective of how many generators one discards, when one looks at the temporal aspect of the distribution one obtains undesirable results if successive random numbers are generated with successive seeds (with one discarding the same number of generators for each seed).

我的问题是,究竟是什么在左右采用的算法 System.Random 的结果在不同的种子之间的时间相关性所描述的方式?

如果我们随机生成布尔值的无限序列,则概率 P(N)获得的 N 连续布尔这是相同的值(如 [真,真,真] [假,真,真,真,假] )的(1/2)^ N 。作为一个 快速检查,我们有归一化条件:

If we generate an infinite sequence of random booleans, then the probability P(n) of getting n successive booleans which are of the same value (e.g. the [True,True,True] in [False,True,True,True,False]) is (1/2)^n. As a quick check we have the normalisation condition :

P(1)+P(2)+....P(infty) = (1/2) + (1/2)^2 + ... = 1

下面code:

The following code :

module Main where
import Data.List
import System.Random

generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
  where newGen = snd $ ((random startGen) :: (Bool,StdGen)) 

better_mkStdGen generation seed = 
  generateNthGenerator (mkStdGen seed) generation

randomNums generation = 
  map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False] 

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums 10

main = do
  print results -- [(8,1493),(9,8507)]

利用连续的种子发电机,丢弃10发电机使用生成的随机结果产生之前,每个连续的布尔值。 10000的随机数序列的生成,因此我们预期所应遵循相反的布尔大约5000布尔值(如 [真] [假,真,假,假] ),那里能是2500布尔这之后是相同的布尔,但再其次是相对的布尔(如 [真,真] [假,真,真,假] ),约1250布尔这组进三分球,等等。

generates each successive boolean using generators from the consecutive seed, discarding 10 generators before using the resulting random result. A sequence of 10000 random numbers is generated, and so we expect roughly 5000 booleans to be followed by the opposite boolean (e.g. [True] in [False,True,False,False]), for there to be 2500 booleans which are followed by the same boolean but then followed by the opposed boolean (e.g. [True,True] in [False,True,True,False]), about 1250 booleans which group into 3s, etc.

因此​​,从code以上,我们得到1493 8组,8507 9组。这不是预期的,我们获得了类似的结果,不论多少发电机被丢弃(在上述丢弃每个种子发生器的数量的例子为10)。 [注意,当我们做同样的实验 TF-随机我们没有得到相同的行为,后面看到。

So from the code above we get 1493 8-groups and 8507 9-groups. This is not what is expected, and we get similar results irrespective of how many generators are discarded (in the example above the number of generators discarded for each seed is 10). [Note when we do the same experiment with tf-random we don't get the same behaviour, see later on].

如果我们不是使用的 previously产生发生器产生连续的布尔值的(这是我猜的,它最初的设计中使用,因为随机时尚本身返回一个新的生成器),我们似乎得到更合理的结果:

If we instead generate successive booleans using the previously generated generator (which is I guess the fashion in which it was originally designed to be used, since random itself returns a new generator), we seem to get more reasonable results :

module Main where
import Data.List
import System.Random

generateRandomInner gen = 
  let (randBool, newGen) = (random gen)::(Bool,StdGen)
  in randBool:(generateRandomInner newGen)

generateRandoms seed =
  let (randBool, newGen) = (random $ mkStdGen seed)::(Bool,StdGen) 
  in randBool:(generateRandomInner newGen)

seed = 0

randomNums = generateRandoms seed

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums
main = do 
  print results
  --[(1,4935),(2,2513),(3,1273),(4,663),(5,308),
  -- (6,141),(7,86),(8,45),(9,16),(10,12),(11,6),
  -- (12,1),(13,1)]

所以,我们得到4935单身(约等于0.5 * 10000),2513双对象(约等于0.5 ^ 2 * 10000),1273三倍(约等于0.5 ^ 3 * 10000)等,这是我们所期望的。

So we get 4935 singletons (roughly equals 0.5*10000), 2513 duples (roughly equals 0.5^2*10000), 1273 triples (roughly equals 0.5^3*10000) etc, which is what we expect.

因此​​,似乎是,如果我们产生(通过 System.Random ),其中每个连续的随机与连续种子,在那里我们丢弃的相同号产生的一个随机序列发电机每个种子,不希望相关仍然存在发电机抛弃,不论数量。

So it seems that if we generate (via System.Random) a random sequence where each successive random is generated with the successive seed, where we discard the same number of generators for each seed, an undesirable correlation persists irrespective number of generators discarded.

是什么样的随机数生成的算法的性能 System.Random 的结果呢?

请注意,如果我们采用上面失败的方法TF-随机(redditt文章),那么我们得到预期的结果:

Note that if we employ the failing method above with tf-random (redditt article) then we get the expected results :

module Main where
import Data.List
import System.Random
import System.Random.TF

generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
  where newGen = snd $ ((random startGen) :: (Bool,TFGen)) 

better_mkStdGen generation seed = 
  generateNthGenerator (seedTFGen (0,0,0,seed)) generation

randomNums generation = 
  map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False] 

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums 10

main = do
  print results
  -- [(1,4867),(2,2573),(3,1243),(4,646),(5,329),
  -- (6,176),(7,80),(8,43),(9,26),(10,10),(11,4),
  -- (12,2),(19,1)]

即。 50%的单身人士,25%为双对象(2组),等等...

i.e. 50% are singletons, 25% are duples (groups of 2), etc...

推荐答案

让我们先来看看什么是code说,我们可以到达那里。

Let's begin by looking at what the code says, and we can get there.

首先,随机适用于布尔等价于:

randomB :: StdGen -> (Bool, StdGen)
randomB g = let (i, g') = next g in (i `mod` 2 == 1, g')

事实上,如果我替换随机 randomB 其中,这是适合你的程序中,我得到了相同的结果。关键在于,确定布尔值,我们关心的是下一个诠释值是奇数还是偶数。

In fact, if I replace random with randomB where that's appropriate in your program, I get identical results. The point is that for determining booleans, all we care about is whether the next Int value is even or odd.

接下来,让我们来看看定义 StdGen

Next, let's look at the definition of StdGen:

data StdGen 
 = StdGen Int32 Int32

于是两个的Int32 是状态。让我们来看看他们是如何与 mkStdGen 以及它们如何与调整接下来初始化:

So two Int32s are the state. Let's see how they're initialized with mkStdGen and how they're adjusted with next:

mkStdGen :: Int -> StdGen -- why not Integer ?
mkStdGen s = mkStdGen32 $ fromIntegral s

mkStdGen32 :: Int32 -> StdGen
mkStdGen32 s
 | s < 0     = mkStdGen32 (-s)
 | otherwise = StdGen (s1+1) (s2+1)
      where
    (q, s1) = s `divMod` 2147483562
    s2      = q `mod` 2147483398

...

stdNext :: StdGen -> (Int, StdGen)
-- Returns values in the range stdRange
stdNext (StdGen s1 s2) = (fromIntegral z', StdGen s1'' s2'')
    where   z'   = if z < 1 then z + 2147483562 else z
        z    = s1'' - s2''

        k    = s1 `quot` 53668
        s1'  = 40014 * (s1 - k * 53668) - k * 12211
        s1'' = if s1' < 0 then s1' + 2147483563 else s1'

        k'   = s2 `quot` 52774
        s2'  = 40692 * (s2 - k' * 52774) - k' * 3791
        s2'' = if s2' < 0 then s2' + 2147483399 else s2'

请注意两件有趣的事情:

Note two interesting things:

  1. 方式 S2 被初始化的保证,这将是1,除非你发送一个非常高的数字,以 mkStdGen ,在这种情况下,这将是2(也有在的Int32 范围少于200个值将初始化 S2 2。

  1. The way s2 is initialized guarantees that it will be 1 unless you send a very high number to mkStdGen, in which case it will be 2 (there are fewer than 200 values in the Int32 range that will initialize s2 to 2.

状态的两半保持鲜明 - 更新 S2 仅在previous取决于 S2 ,而不是在previous S1 ,反之亦然。

The two halves of the state are kept distinct - the updated s2 depends only on the previous s2, not on the previous s1, and vice versa.

因此​​,如果你检查,你得到了传递给 better_mkStdGen 代一定固定数量的发电机,它们的状态下半年将永远是相同的

As a consequence, if you examine the generators that you get out of a certain fixed number of generations passed to better_mkStdGen, the second half of their state will always be identical.

通过添加到您的程序尝试:

Try it by adding this to your program:

print $ map (dropWhile (/= ' ') . show . better_mkStdGen 10) [0 .. 20]

那么接下来的问题是,为什么在 S1 混合功能无法正确​​地混合的最后一位。需要注意的是它的编写方式, S1 K 将具有相同的奇偶 S1 ,所以 S1 状态,只有从previous不同的奇偶校验 S1 状态如果 S1 结束是小于零。

So then the question is, why the mixing function in s1 fails to mix the last bit properly. Note that the way it's written, s1' and k will have the same parity as s1, so the s1 state only has different parity from the previous s1 state if s1' ends up being less than zero.

在这一点上我需要handwave了一下,说,这样 S1 计算意味着,如果对 S1 接近对方,两个值 S1 也将接近在一起,一般只会40014倍,相距甚远,因为他们是最初,这在我们允许 S1 的范围内,使相邻值很可能会最终在零的同一侧。

At this point I need to handwave a bit and say that the way s1' is computed means that if two initial values for s1 are close to each other, the two values for s1' will also be close together, and in general will only be 40014 times as far apart as they were initially, which in the range we allow for s1 makes neighboring values quite likely to end up on the same side of zero.

这篇关于时空相关性(采用System.Random.TF时没有present)采用System.Random时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆