有没有打印出一个洗牌列表,而实际上对矫正列表中的算法? [英] Is there an algorithm which prints out a shuffled list without actually modifing the list?

查看:145
本文介绍了有没有打印出一个洗牌列表,而实际上对矫正列表中的算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读这个问题我开始怀疑后:是否有可能有一个洗牌的算法,不修改或复制原来的名单?

After reading this question I started to wonder: is it possible to have a shuffling algorithm which does not modify or copy the original list?

要弄清楚:

想象一下,您将得到对象的列表。该列表的大小可以是任意的,但假设它是pretty的大(比如,10,000,000项)。你需要打印出清单的项目以随机顺序,而你需要做的是尽可能快。但是,你不应该:

Imagine you are given a list of objects. The list size can be arbitrary, but assume it's pretty large (say, 10,000,000 items). You need to print out the items of the list in random order, and you need to do it as fast as possible. However, you should not:

  • 复制原来的名单,因为它是非常大的,复制会浪费内存(可能击中可用RAM的限制)很多;
  • 修改原来的列表中,因为它的排序以某种方式和其他一些部分后来取决于它被排序。
  • 创建一个索引列表,因为,再次,名单是非常大的,复制通吃太多的时间和内存。 (澄清:这是指任何其他列表,其具有元件作为原始列表的数相同)

这可能吗?

补充:更多说明

  1. 我想要改组的列表中真正随机的方式与所有排列同样有可能(当然,假设我们有一个正确的RAND()函数开始)。
  2. 建议,我使指针的列表,或指数列表,或任何其他列表,将有元件作为原始列表的数相同,明确地由原始质询视为低效的。您可以创建其他列表,如果你想要的,但他们应该是要比原来的列表较小的严重订单。
  3. 在原来的列表就像一个数组,你可以通过它的索引在O(1)检索它的任何项目。 (因此,没有双向链表的东西,在那里你必须通过列表循环到你需要的项目。)

新增2 :好吧,让我们这么说吧:你有一个1TB移动硬盘装满数据的项目,每个项目512个字节大(一个扇区)。要将所有这些数据复制到另一个1TB移动硬盘,而洗牌的所有项目。你想这样做尽可能快(单传过来的数据,等)。你有512MB的RAM可用,而不要指望掉。 (这是一个理论性的情况下,我没有这样的事在实践中。我只是想找到最完美的algorithm.item。)

Added 2: OK, let's put it this way: You have a 1TB HDD filled with data items, each 512 bytes large (a single sector). You want to copy all this data to another 1TB HDD while shuffling all the items. You want to do this as fast as possible (single pass over data, etc). You have 512MB of RAM available, and don't count on swap. (This is a theoretical scenario, I don't have anything like this in practice. I just want to find the perfect algorithm.item.)

推荐答案

下面是一个非常简单的证明,没有PRNG方案可以工作:

Here is a very simple proof that no PRNG scheme can work:

该PRNG想法有两个阶段:首先,选择一个PRNG和它的初始状态;第二,利用PRNG洗牌输出。嗯,还有的 N!的可能的排列,所以你至少需要的 N!的不同可能启动的状态,在进入第2阶段。这意味着,在第二阶段开始时,你必须已经至少的登录 2 N!的状态,这是不允许的位。

The PRNG idea has two phases: first, select a PRNG and its initial state; second, use the PRNG to shuffle the output. Well, there are N! possible permutations, so you need at least N! different possible start states, entering phase 2. This means that at the start of phase 2 you must have at least log2 N! bits of state, which isn't allowed.

然而,这并不排除方案,其中所述算法接收来自环境的新的随机比特,因为它去。有可能是,比如说,一个PRNG读取其初始状态的懒洋洋地的,但保证不会重复。我们可以证明有没有?

However this does not rule out schemes where the algorithm receives new random bits from the environment as it goes. There might be, say, a PRNG that reads its initial state lazily and yet is guaranteed not to repeat. Can we prove there isn't?

假设我们有一个完美的洗牌算法。试想一下,我们开始运行它,当它完成了一半工作,我们把计算机进入睡眠状态。现在该程序的完整状态已保存的地方。让我们的取值的是集合所有可能的状态的程序可以在这个半路标记是

Suppose we do have a perfect shuffling algorithm. Imagine we start running it, and when it's halfway done, we put the computer to sleep. Now the full state of the program has been saved somewhere. Let S be the set of all possible states the program could be in at this halfway mark.

由于算法是正确的,并保证终止,还有一个功能的 F 的其中,由于所保存的程序状态加上任何足够长的位串,产生磁盘读取的一个有效的序列和写入完成洗牌。计算机本身实现此功能。但认为这是一个数学函数:

Since the algorithm is correct and guaranteed to terminate, there is a function f which, given the saved program state plus any long enough string of bits, produces a valid sequence of disk reads and writes completing the shuffle. The computer itself implements this function. But consider it as a mathematical function:

F (S&倍;比特)RARR;顺序读取和写入

那么,平凡,存在一个函数的先按g 的其中给出的只有的保存的程序状态,产生一组磁盘位置尚未读取和写入。 (简单地传位的一些任意字符串的 F 的,然后看看结果。)

Then, trivially, there exists a function g which, given only the saved program state, produces the set of disk locations yet to be read and written. (Simply pass some arbitrary string of bits to f, then look at the results.)

先按g 取值的→ 设置位置读取和写入

g : Sset of locations to read and write

证明的其余位是表明的先按g 的至少包含的 <子> N C <子> N / 2 <域/ em>的无论算法的选择的不同组。如果这是真的,必须有至少是许多元素的取值的,因此该方案的国家必须至少包含的登录 2 <子> N < /分> C <子> N / 2 的位在半路标志,违反了规定。

The remaining bit of the proof is to show that the domain of g contains at least NCN/2 different sets regardless of the choice of algorithm. If that's true, there must be at least that many elements of S, and so the state of the program must contain at least log2 NCN/2 bits at the halfway mark, in violation of the requirements.

我不知道如何证明最后一点,虽然,因为的或者的设定-OF-位置阅读的的集合 - - 位置-to-写入可以是低熵,取决于算法。我怀疑有信息理论,可以切结的一些明显的原则。标记的人希望这个社区的wiki将提供它。

I'm not sure how to prove that last bit, though, since either the set-of-locations-to-read or the set-of-locations-to-write can be low-entropy, depending on the algorithm. I suspect there's some obvious principle of information theory that can cut the knot. Marking this community wiki in the hopes someone will supply it.

这篇关于有没有打印出一个洗牌列表,而实际上对矫正列表中的算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆