从具有加权概率的列表中随机选择 [英] Randomly choosing from a list with weighted probabilities

查看:32
本文介绍了从具有加权概率的列表中随机选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 N 个元素的数组(代表给定字母表的 N 个字母),数组的每个单元格都保存一个整数值,该整数值表示该字母的给定文本中出现的次数.现在我想根据给定约束的出现次数从字母表中的所有字母中随机选择一个字母:

I have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:

  • 如果字母具有正(非零)值,则算法始终可以选择它(当然,概率更大或更小).

  • If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).

如果字母 A 的值比字母 B 的值高,那么它必须更有可能被算法选中.

If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.

现在,考虑到这一点,我想出了一个简单的算法来完成这项工作,但我只是想知道是否有更好的方法可以做.这似乎是非常基本的,我认为为了更有效地实现这一点,可能需要做更多聪明的事情.这是我认为的算法:

Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:

  • 将数组中的所有频率相加.存储在 SUM 中
  • 选择一个从 0 到 SUM 的随机值.将其存储在 RAN 中
  • [While] RAN > 0,从第一个开始,依次访问数组中的每个单元格,然后从RAN中减去该单元格的值
  • 最后访问的单元格是选中的单元格

那么,还有比这更好的方法吗?我错过了什么吗?

So, is there a better thing to do than this? Am I missing something?

我知道大多数现代计算机的计算速度如此之快,我什至不会注意到我的算法是否效率低下,因此这更像是一个理论问题而不是实际问题.

I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.

我更喜欢解释性算法,而不是仅仅为答案编写代码,但是如果您更愿意在代码中提供答案,我对此没有意见.

I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.

推荐答案

思路:

  • 遍历所有元素并将每个元素的值设置为迄今为止的累积频率.
  • 生成一个介于 1 和所有频率之和之间的随机数
  • 对该数字的值进行二分搜索(找到大于或等于该数字的第一个值).
  • Iterate through all the elements and set the value of each element as the cumulative frequency thus far.
  • Generate a random number between 1 and the sum of all frequencies
  • Do a binary search on the values for this number (finding the first value greater than or equal to the number).

示例:

Element    A B C D
Frequency  1 4 3 2
Cumulative 1 5 8 10

生成1-10范围内的随机数(1+4+3+2 = 10,与累加列表最后一个值相同),进行二分查找,返回值如下:

Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:

Number   Element returned
1        A
2        B
3        B
4        B
5        B
6        C
7        C
8        C
9        D
10       D

这篇关于从具有加权概率的列表中随机选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆