从元素具有权重的列表中选择 k 个随机元素 [英] Select k random elements from a list whose elements have weights

查看:37
本文介绍了从元素具有权重的列表中选择 k 个随机元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

没有任何权重(相等概率)的选择被精美地描述了 此处.

Selecting without any weights (equal probabilities) is beautifully described here.

我想知道是否有办法将这种方法转换为加权方法.

I was wondering if there is a way to convert this approach to a weighted one.

我也对其他方法感兴趣.

I am also interested in other approaches as well.

更新:抽样替换

推荐答案

我知道这是一个非常古老的问题,但我认为如果您应用一点数学,有一个巧妙的技巧可以在 O(n) 时间内完成此任务!

I know this is a very old question, but I think there's a neat trick to do this in O(n) time if you apply a little math!

指数分布有两个非常有用的属性.

The exponential distribution has two very useful properties.

  1. 给定来自具有不同速率参数的不同指数分布的 n 个样本,给定样本最小的概率等于其速率参数除以所有速率参数的总和.

  1. Given n samples from different exponential distributions with different rate parameters, the probability that a given sample is the minimum is equal to its rate parameter divided by the sum of all rate parameters.

它是无记忆的".因此,如果您已经知道最小值,那么任何剩余元素是第二个到最小值的概率与如果真正的最小值被删除(并且从未生成),该元素将成为新元素的概率相同分钟这似乎很明显,但我认为由于一些条件概率问题,其他分布可能并非如此.

It is "memoryless". So if you already know the minimum, then the probability that any of the remaining elements is the 2nd-to-min is the same as the probability that if the true min were removed (and never generated), that element would have been the new min. This seems obvious, but I think because of some conditional probability issues, it might not be true of other distributions.

根据事实1,我们知道选择单个元素可以通过生成这些速率参数等于权重的指数分布样本,然后选择具有最小值的那个来完成.

Using fact 1, we know that choosing a single element can be done by generating these exponential distribution samples with rate parameter equal to the weight, and then choosing the one with minimum value.

使用事实 2,我们知道我们不必重新生成指数样本.相反,只需为每个元素生成一个,并取具有最低样本的 k 个元素.

Using fact 2, we know that we don't have to re-generate the exponential samples. Instead, just generate one for each element, and take the k elements with lowest samples.

找到最低的 k 可以在 O(n) 中完成.使用 Quickselect 算法找到第 k 个元素,然后简单地再次遍历所有元素并输出所有低于第 k 个.

Finding the lowest k can be done in O(n). Use the Quickselect algorithm to find the k-th element, then simply take another pass through all elements and output all lower than the k-th.

一个有用的注意事项:如果您无法立即访问库来生成指数分布样本,则可以通过以下方式轻松完成:-ln(rand())/weight

A useful note: if you don't have immediate access to a library to generate exponential distribution samples, it can be easily done by: -ln(rand())/weight

这篇关于从元素具有权重的列表中选择 k 个随机元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆