如何在不替换的情况下增量采样? [英] How to incrementally sample without replacement?

查看:33
本文介绍了如何在不替换的情况下增量采样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python 有 my_sample = random.sample(range(100), 10) 可以从 [0, 100) 随机抽样而不替换.

Python has my_sample = random.sample(range(100), 10) to randomly sample without replacement from [0, 100).

假设我已经采样了 n 个这样的数字,现在我想在不替换的情况下再采样一个(不包括之前采样的任何一个 n),怎么做超级有效吗?

Suppose I have sampled n such numbers and now I want to sample one more without replacement (without including any of the previously sampled n), how to do so super efficiently?

更新:从合理高效"变为超级高效"(但忽略常数因素)

update: changed from "reasonably efficiently" to "super efficiently" (but ignoring constant factors)

推荐答案

OP 读者注意事项:请考虑查看 原先接受的答案 理解逻辑,再理解这个答案.

Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

Aaaaaand 为完整起见:这是死灵法师的答案的概念,但进行了调整,因此需要一个禁用数字列表作为输入.这与我之前的答案中的代码相同,但我们从forbid构建了一个状态, 在我们生成数字之前.

Aaaaaand for completeness sake: This is the concept of necromancer’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

  • 这是时间O(f+k)和内存O(f+k).显然,这是最快的方法,不需要forbid(排序/设置)的格式.我认为这在某种程度上使它成为赢家^^.
  • 如果forbid是一个集合,重复猜测的方法用O(k⋅n/(n-(f+k)))更快,这是非常接近 O(k) 对于 f+k 不是 非常 接近 n.
  • 如果 forbid 已排序,我的荒谬算法更快:
  • This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
  • If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
  • If forbid is sorted, my ridiculous algorithm is faster with:
import random
def sample_gen(n, forbid):
    state = dict()
    track = dict()
    for (i, o) in enumerate(forbid):
        x = track.get(o, o)
        t = state.get(n-i-1, n-i-1)
        state[x] = t
        track[t] = x
        state.pop(n-i-1, None)
        track.pop(o, None)
    del track
    for remaining in xrange(n-len(forbid), 0, -1):
        i = random.randrange(remaining)
        yield state.get(i, i)
        state[i] = state.get(remaining - 1, remaining - 1)
        state.pop(remaining - 1, None)

用法:

gen = sample_gen(10, [1, 2, 4, 8])
print gen.next()
print gen.next()
print gen.next()
print gen.next()

这篇关于如何在不替换的情况下增量采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆