从加权列表中选择一个随机项目 [英] Select a random item from a weighted list

查看:106
本文介绍了从加权列表中选择一个随机项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序来从美国人口普查中选择一个随机名称姓氏列表.列表格式为

I am trying to write a program to select a random name from the US Census last name list. The list format is

Name           Weight Cumulative line
-----          -----  -----      -
SMITH          1.006  1.006      1
JOHNSON        0.810  1.816      2
WILLIAMS       0.699  2.515      3
JONES          0.621  3.136      4
BROWN          0.621  3.757      5
DAVIS          0.480  4.237      6

假设我将数据加载到类似的结构中

Assuming I load the data in to a structure like

Class Name
{
    public string Name {get; set;}
    public decimal Weight {get; set;}
    public decimal Cumulative {get; set;}
}

哪种数据结构最适合保存名称列表,什么是从列表中选择随机名称但名称分布与现实世界相同的最佳方法.

What data structure would be best to hold the list of names, and what would be the best way to select a random name from the list but have the distribution of names be the same as the real world.

如果只有前10,000行会影响数据结构,我将只使用它.

I will only be working with the first 10,000 rows if it makes a difference in the data structure.

我尝试着研究其他一些关于加权随机性的问题,但是在将理论转化为代码方面遇到了一些麻烦.我对数学理论了解不多,所以不知道这是一个有或没有替换"的随机选择,我希望同一个名字能够多次出现,无论那个意思是什么.

I have tried looking at some of the other questions about weighted randomness but I am having a bit of trouble turning theory in to code. I do not know much about math theory so I do not know if this is a "With or without replacement" random selection, I want the same name able to show up more than once, which ever that one means.

推荐答案

处理此问题的最简单"方法是将其保存在列表中.

The "easiest" way to handle this would be to keep this in a list.

然后您可以使用:

Name GetRandomName(Random random, List<Name> names)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    return names.Last(name => name.Culmitive <= value);
}

如果需要考虑速度,则可以存储仅包含Culmitive值的单独数组.这样,您可以使用Array.BinarySearch快速找到合适的索引:

If speed is a concern, you could store a separate array of just the Culmitive values. With this, you could use Array.BinarySearch to quickly find the appropriate index:

Name GetRandomName(Random random, List<Name> names, double[] culmitiveValues)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    int index = Array.BinarySearch(culmitiveValues, value);
    if (index >= 0)
        index = ~index;

    return names[index];
}

另一个可能是最有效的选择是使用 C5通用集合库树类.然后,您可以使用RangeFrom查找适当的名称.这样做的好处是不需要单独的收集

Another option, which is probably the most efficient, would be to use something like one of the C5 Generic Collection Library's tree classes. You could then use RangeFrom to find the appropriate name. This has the advantage of not requiring a separate collection

这篇关于从加权列表中选择一个随机项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆