如何根据概率选择记录? [英] How to select record based on probability?

查看:78
本文介绍了如何根据概率选择记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名称表,其中包含2个字段,dblProbability和txName,其中dblProbabilty是应该选择该行中Name的相对概率。如何从我的表中获得1个随机名称,考虑到它应该发生的概率?

解决方案

拨打Rnd()播种日期和/或时间,取决于您想要运行查询的频率。如果返回的值小于或等于概率字段,则返回该行。这样的事情:

展开 | 选择 < span class =codeDivider> | Wrap | 行号


@Rabbit:

我们确定TheSmileyCoder不是''行之间的比较?

我把这个问题的意思是:说给出name_alpha / 5与name_beta / 2相对与name_charlie / 1相对会说明name_alpha比name_charlie多5倍的可能性,并且比name_beta被挑选的可能性高2.5倍,类似name_beta的可能性是name_charlie的两倍。

@TheSmileyCoder :

您是否将数据相互比较?你如何分配概率,正如我在上面的例子中给出的那样,这有点武断,或者概率必须加起来达到100%(即上面变得接近name_alpha / 63对比 ; name_beta / 25" vs." name_charlie / 12" ... ...可能更容易编码...)?

如果没有要求总和为100%那么你是否允许平等分配的概率...会在逻辑中抛出一把猴子扳手!


期待你的答案:

所以,让我们一起去看看没有猴子-wrench(设置猴子扳手的基础):

在第二种方法中...使用rnd(1,100),其中如果rnd(1,100)> = 63则name_alpha,rnd( 1,100)=在63和12之间然后是name_beta,rnd(1,100)< = 12然后是name_charlie。如果我们相信rnd函数是真正随机的那么加权应该起作用。

因此,即使你不要求总和为100,也应该工作,然后将值转换为加权平均值加权...总和所有概率:5 + 2 + 1 = 8然后按分区5 / 8,2 / 8,1 / 8转换为百分比...给出大约63,25,12。

从这里,查询应该返回[概率]所有记录大于或等于rnd(1,100)(如果需要,返回到任意值)。从逻辑上讲,我们知道应该比其余记录更频繁地选择这些记录。那么我们是否以最大([概率])或最小([概率])返回记录?我选择min([概率]),好像rnd(1,100)小,然后max([probability])总是返回。你有一个单一的名字。

猴子扳手:

再一次......像以前一样拉出记录,寻找最小值([概率]);但是,这次寻找计数(min([概率])> 1。如果你确实拥有多个具有该概率的记录,那么我们需要提取他们有min([概率])的所有记录,然后在记录之间进行非加权随机选择,因为他们有相同的机会拥有被选中开始...

-z


我以为他们只是在行内进行比较。如果他们试图从累积概率中选择一个,那么它几乎是相同的,除了你需要一个起始范围和结束范围并使用另外的比较。

展开 < span class =codeDivider> | 选择 | Wrap | Line编号

I have a table of names, with 2 fields in it, dblProbability and txName, where dblProbabilty is the relative probability that the Name in that row should be selected. How can I get 1 random name from my table, taking into account the probability that it should occur?

解决方案

Make a call to Rnd() seeded with the date and/or time, depending on how often you want run the query. If the returned value is less than or equal to your probability field, then return that row. Something like this:

Expand|Select|Wrap|Line Numbers


@Rabbit:
Are we sure that TheSmileyCoder isn''t comparing between rows?
I took this question to mean: say given "name_alpha/5" vs. "name_beta/2" vs. "name_charlie/1" would state something along the lines that name_alpha would be 5 times more likely than name_charlie and 2.5 times more likely than name_beta to be picked and similarly name_beta would be twice as likely than name_charlie to be picked.
@ TheSmileyCoder:
Are you comparing the data against each other? How are you assigning the probabilities, in as I have given in the example above, which is somewhat arbitrary, or do the probabilities have to add up to 100% (i.e. so the above becomes close to "name_alpha/63" vs. "name_beta/25" vs. "name_charlie/12"… which might be easier to code for...)?
If no there is no requirement for summation to 100% then are you allowing equal probabilities to be assigned… that will toss a monkey-wrench in the logic!

In anticipation of your answer:
So, let’s go with the no monkey-wrench (sets the foundation for the monkey-wrench):
In the second method... use the rnd(1,100) wherein if the rnd(1,100)>=63 then name_alpha, rnd(1,100)=between63and12 then name_beta, rnd(1,100)<=12 then name_charlie. If we trust that the rnd function is truely random then the weighting should work.
Thus should work even if you don''t require the sum to be 100 then convert the values to weighting as for weighted average... sum all of the probilities: 5+2+1= 8 then convert to percentage by division 5/8, 2/8, 1/8... gives approx 63, 25, 12.
From here, the query should return all records where the [probability] is greater than or equal to the rnd(1,100) (back convert to the arbitrary value if needed). Now logically we know that these records should be chosen more often than the remaining records. So do we return the record with the max([probability]) or the min([probability])? I opt for the min([probability]) as if the rnd(1,100) was small then max([probability]) would always returned. You have your single name.
Monkey-wrench:
Once again… pull the records as before, look for the min([probability]); however, this time look for count(min([probability]))>1. If you do have more than one record with that probability, then we need to pull all of the records wherein they have min([probability]) then do an un-weighted random pick between records as they would have had an equal chance to have been selected to begin with...
-z


I assumed they were comparing just within the row. If they are trying to pick one out of a cumulative probability, then it''s pretty much the same thing except you will need a start range and end range and use an additional comparison.

Expand|Select|Wrap|Line Numbers


这篇关于如何根据概率选择记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆