生成1,000,000随机排列样本 [英] Generate sample of 1,000,000 random permutations

查看:104
本文介绍了生成1,000,000随机排列样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在与大量整数排列。每个排列中的元素数是K.该元件的大小是1个字节。我需要产生N个唯一的随机排列。
约束:K< = 144,N< = 1,000,000

I am working with large number of integer permutations. The number of elements in each permutation is K. The element size is 1 byte. I need to generate N unique random permutations.
Constraints: K <= 144, N <= 1,000,000.

我想出了以下简单的算法:

I came up with the following straightforward algorithm:

  1. 生成的N个随机排列的列表。存储所有排列在RAM中。
  2. 排序列表,并删除所有副本(如果有的话)。重复的次数会比较小。
  3. 如果有任何重复,加上随机排列的列表,直到有N个排列并返回步骤2。

有没有更好的方式来做到这一点?特别是,有没有办法不存储所有排列在RAM(把它们写在磁盘上,而产生)?

Is there a better way to do this? Especially, is there a way to not store all permutations in RAM (write them on disk while generating)?

修改:最后,所生成的排列需要按顺序(一个接一个,不需要随机接入)进行访问。 RAM为更关键的因素(我preFER来一次不存储所有排列在RAM)。

Edit: In the end, the generated permutations need to be accessed sequentially (one-by-one, no need for random access). The RAM is more crucial factor (I would prefer to not store all permutations at once in RAM).

推荐答案

使用一个可能的解决方案的 布鲁姆过滤器

One possible solution is using bloom filters.

存储你的排列在磁盘上(写他们按顺序),并保持在RAM中的布隆过滤器。
一旦你生成一个置换 - 检查它是否存在于布隆过滤器,如果布隆过滤器说,这是不写入磁盘yet-写出来,布隆过滤器没有假阴性。
如果布隆过滤器但是说,这是在磁盘上 - 它可能是错误的。

Store your permutations on disk (write them sequentially) and maintain a bloom filter in RAM.
Once you generate a permutation - check if it exists in the bloom filter, if the bloom filter says it is not written to disk yet- write it, bloom filters don't have false negatives.
If the bloom filter however says it is on the disk - it might be wrong..

如果布隆过滤器表示,置换已经存在,你可以决定是否要退出这一候选人,进入到下一个没有检查它是否真的已经在一组,也可以搜索盘面看如果它真的存在。
如果您选择了以后,你应该考虑维护一个聪明的DS的排列,如哈希表 B +树

if the bloom filter said "the permutation already exists", you can decide if you want to quit this candidate and go to the next one without checking if it is really already in the set, or you can search the disk to see if it is really there.
If you chose the later, you should consider maintaining a smart DS for the permutations such as a hash table or a B+ tree.

布鲁姆过滤器是完美的搭配在这里 - 他们的目的是重新present一组是广阔的阅读,同时给予0假阴性,这是最重要的事情在这里

Bloom Filters are perfect match in here - they are designed to represent a set that is expansive to read, while giving 0 false negatives, which is the most important thing here.

这篇关于生成1,000,000随机排列样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆