如何在不更换逐步样? [英] How to incrementally sample without replacement?

查看:185
本文介绍了如何在不更换逐步样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python有 my_sample = random.sample(范围(100),10)来随机抽取无需更换从 [0,100)

Python has my_sample = random.sample(range(100), 10) to randomly sample without replacement from [0, 100).

假设我已经取样 N 这样的数字,现在我想品尝多一个不用更换(不包括任何的previously采样的 ñ),如何做到这一点的超级效率?

Suppose I have sampled n such numbers and now I want to sample one more without replacement (without including any of the previously sampled n), how to do so super efficiently?

更新:由「合理有效地到超级高效的(但忽略常数因子)

update: changed from "reasonably efficiently" to "super efficiently" (but ignoring constant factors)

推荐答案

注意从OP读者:请考虑在看的最初接受的答案来理解其中的逻辑,然后明白这个答案。

Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

Aaaaaand的完整性的缘故:这是 no_answer_not_upvoted的回答的概念,但适应所以需要禁止的号码清单作为输入。这是一样的code在我的previous答案的,但我们从<$构建一个状态C $ C>禁止,之前我们生成的数字。

Aaaaaand for completeness sake: This is the concept of no_answer_not_upvoted’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

  • 这是时间 O(F + K)和内存 O(F + K)。显然,这是最快的东西可能不朝禁止(排序/套)的格式要求。我想,这使得这个以某种方式^^赢家。
  • 如果禁止是一组,反复猜测的方法是更快 O(k⋅n/(N-(F + K)) ),这是非常接近 O(K) F + K 不是非常的接近 N
  • 如果禁止进行排序,我可笑的算法是快搭配:
  • This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
  • If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
  • If forbid is sorted, my ridiculous algorithm is faster with:
import random
def sample_gen(n, forbid):
    state = dict()
    track = dict()
    for (i, o) in enumerate(forbid):
        x = track.get(o, o)
        t = state.get(n-i-1, n-i-1)
        state[x] = t
        track[t] = x
        state.pop(n-i-1, None)
        track.pop(o, None)
    del track
    for remaining in xrange(n-len(forbid), 0, -1):
        i = random.randrange(remaining)
        yield state.get(i, i)
        state[i] = state.get(remaining - 1, remaining - 1)
        state.pop(remaining - 1, None)

用法:

gen = sample_gen(10, [1, 2, 4, 8])
print gen.next()
print gen.next()
print gen.next()
print gen.next()

这篇关于如何在不更换逐步样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆