选择K掉的n [英] Choosing k out of n

查看:137
本文介绍了选择K掉的n的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要选择 K 元素均匀随机出一个可能的 N 而不选择相同数量的两倍。有两种方法琐碎于此。

I want to choose k elements uniformly at random out of a possible n without choosing the same number twice. There are two trivial approaches to this.

  1. 请所有 N 的可能性列表。随机播放它们(你不需要 打乱所有 N 数字只是 K 他们通过执行第一 K 步骤费舍尔耶茨)。选择第一个 K 。这种方法 需要 O(K)时间(假设分配大小的数组 N 需要 O(1)的时间)和 O(N)的空间。这是一个问题,如果 K 很 相对于 N
  2. 存储一组看到的元素。选择一个号码随机地从 [0,N-1] 。而该元件是在该组,然后选择一个新的数。 此方法采用 O(K)的空间。在运行时是多一点 复杂的分析。如果 K = THETA(N)则运行时间 O(K * LG(K))= O(N * LG(N)),因为它是的优惠券收藏 问题的。如果 K 是相对于 N 则需要稍微小 选择比 O(K)更多的是因为概率(虽然低) 相同的数量的两倍。这比在上述溶液中更好 空间方面,而且更糟糕的在运行时间方面。
  1. Make a list of all n possibilities. Shuffle them (you don't need to shuffle all n numbers just k of them by performing the first k steps of Fisher Yates). Choose the first k. This approach takes O(k) time (assuming allocating an array of size n takes O(1) time) and O(n) space. This is a problem if k is very small relative to n.
  2. Store a set of seen elements. Choose a number at random from [0, n-1]. While the element is in the set then choose a new number. This approach takes O(k) space. The run-time is a little more complicated to analyze. If k = theta(n) then the run-time is O(k*lg(k))=O(n*lg(n)) because it is the coupon collector's problem. If k is small relative to n then it takes slightly more than O(k) because of the probability (albeit low) of choosing the same number twice. This is better than the above solution in terms of space but worse in terms of run-time.

我的问题:

有一个 O(K)时, O(K)空间算法对所有<$ C $ ç> K 和 N

is there an O(k) time, O(k) space algorithm for all k and n?

推荐答案

随着Ø (1)哈希表,部分费雪耶茨的方法,可以向运行在O( K 的)时间和空间。诀窍是简单地只存储的修改的数组中的哈希表中的元素。

With an O(1) hash table, the partial Fisher-Yates method can be made to run in O(k) time and space. The trick is simply to store only the changed elements of the array in the hash table.

下面是Java中的一个简单的例子:

Here's a simple example in Java:

public static int[] getRandomSelection (int k, int n, Random rng) {
    if (k > n) throw new IllegalArgumentException(
        "Cannot choose " + k + " elements out of " + n + "."
    );

    HashMap<Integer, Integer> hash = new HashMap<Integer, Integer>(2*k);
    int[] output = new int[k];

    for (int i = 0; i < k; i++) {
        int j = i + rng.nextInt(n - i);
        output[i] = (hash.containsKey(j) ? hash.remove(j) : j);
        if (j > i) hash.put(j, (hash.containsKey(i) ? hash.remove(i) : i));
    }
    return output;
}

这code分配的2倍是一个HashMap; K 的水桶存储修改元素(这应该足以确保哈希表永远不会改头换面),只是运行的部分费舍尔-Yates洗牌就可以了。

This code allocates a HashMap of 2×k buckets to store the modified elements (which should be enough to ensure that the hash table is never rehashed), and just runs a partial Fisher-Yates shuffle on it.

这里有Ideone 的快速测试;它选择两个元素出三30000次,并计算的时候每对元素的被选择的号码。对于一个不带偏见的洗牌,每个有序对应该会出现约5000名(PM 100左右)时,除了不可能的情况下,这两个因素将等于

Here's a quick test on Ideone; it picks two elements out of three 30,000 times, and counts the number of times each pair of elements gets chosen. For an unbiased shuffle, each ordered pair should appear approximately 5,000 (±100 or so) times, except for the impossible cases where both elements would be equal.

这篇关于选择K掉的n的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆