如何随机地整理排列比PRNG期间更多的列表? [英] How to randomly shuffle a list that has more permutations than the PRNG's period?

查看:64
本文介绍了如何随机地整理排列比PRNG期间更多的列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含约3900个元素的列表,我需要对其进行随机置换以产生统计分布.我环顾四周,发现此要洗牌的最大列表长度使用Python random.shuffle 解释说,Python中PRNG的周期为2**19937-1,这导致列表的最大长度为2080,然后才不可能生成所有可能的排列.我只生成列表的300-1000个排列,因此不太可能生成重复的排列,但是,由于这会产生统计分布,因此我希望将所有可能的排列作为潜在样本.

I have a list with around 3900 elements that I need to randomly permute to produce a statistical distribution. I looked around and found this Maximal Length of List to Shuffle with Python random.shuffle that explains that the period of the PRNG in Python is 2**19937-1, which leads to a list with a maximum length of 2080 before it becomes impossible to generate all possible permutations. I am only producing 300-1000 permutations of the list so it unlikely that I will be producing duplicate permutations, however, since this is producing a statistical distribution I would like to have all possible permutations as potential samples.

推荐答案

我同意@ user2357112,这不太可能是真正的问题-但似乎您应该能够在其中使用标准的random模块这样至少所有排列都是可能的.

I agree with @user2357112 that it is unlikely to be a genuine issue -- but it seems like you should be able to use the standard random module in such a way that all permutations are at least possible.

您可以采用分而治之的方法.使用初始种子将列表分为2个列表,每个列表约2000个.此类分区的数量大约为C(4000,2000),大约为1.66 x 10^1202.这小于周期,这表明至少有可能使用random.sample()生成所有此类分区.然后-重新设置随机数生成器的种子,并填充前半部分.然后-重新播种并置换下半场.也许在播种之前花了一点时间,所以您不会遇到涉及系统时钟分辨率的问题.您也可以尝试将初始列表随机划分为大量的较小列表.

You could do a divide-and-conquer approach. Use the initial seed to partition the list into 2 lists of around 2000 each. The number of such partitions is roughly C(4000,2000) which is approximately 1.66 x 10^1202. This is less that the period, which suggests that it is at least possible for all such partitions to be generated with random.sample(). Then - reseed the random number generator and permute the first half. Then -- reseed a second time and permute the second half. Perhaps throw in little time delays before the reseedings so you don't run into issues involving the resolution of your system clock. You could also experiment in randomly partitioning the initial list into larger numbers of smaller lists.

从数学上看,很容易看出,如果您将一个列表随机划分为多个子列表,以便每个分区都具有同等的可能性,然后以使每个子列表均具有同等可能性的方式对每个子列表进行排列,然后将这些子列表的排列粘合在一起要获得整个列表的排列,那么所有整个列表的排列都是一样的.

Mathematically, it is easy to see that if you randomly partition a list into sublists so that each partition is equally likely and then you permute each sublist in such a way that all sublist permutations are equally likely, and glue together these sublist permutations to get a whole-list permutation, then all whole-list permutations are equally likely.

这是一个实现:

import random, time

def permuted(items, pieces = 2):
    sublists = [[] for i in range(pieces)]
    for x in items:
        sublists[random.randint(0,pieces-1)].append(x)
    permutedList = []
    for i in range(pieces):
        time.sleep(0.01)
        random.seed()
        random.shuffle(sublists[i])
        permutedList.extend(sublists[i])
    return permutedList

我不确定确实需要time.sleep(0.01).我担心的是,如果种子在毫秒内发生,那么在某些系统上可能会使用相同的种子.

I'm not sure that time.sleep(0.01) is really needed. My concern was that if the reseeds happened within a millisecond then on some systems the same seed might be used.

作为最后的话,仅仅因为上述功能(具有合适的选择)不能以简单的计数参数被示出为未命中某些置换(初始状态的数量相比较排列数)本本身并不能证明所有排列实际上都是可能的.这将需要对随机数生成器,为其植入种子的哈希函数以及随机播放算法进行更详细的分析.

As a final remark, just because the above function (with suitable choice of pieces) can't be shown to miss certain permutations by a simple counting argument (comparing the number of permutations with the number of initial states) this doesn't in and of itself constitute proof that all permutations are in fact possible. That would require a more detailed analysis of the random number generator, the hash function that seeds it, and the shuffle algorithm.

这篇关于如何随机地整理排列比PRNG期间更多的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆