如何在python中进行类别的加权随机样本 [英] How to do weighted random sample of categories in python

查看:199
本文介绍了如何在python中进行类别的加权随机样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个元组列表,其中每个元组都由一个概率和一个项组成,我想根据其概率对一个项进行采样.例如,给列表[(.3,'a'),(.4,'b'),(.3,'c')]我想在40%的时间内对'b'进行采样.

Given a list of tuples where each tuple consists of a probability and an item I'd like to sample an item according to its probability. For example, give the list [ (.3, 'a'), (.4, 'b'), (.3, 'c')] I'd like to sample 'b' 40% of the time.

在python中执行此操作的规范方法是什么?

What's the canonical way of doing this in python?

我查看了似乎没有适当功能的随机模块,以及numpy.random,尽管其具有多项式功能,但似乎并未以很好的形式返回此问题的结果.我基本上是在matlab中寻找mnrnd之类的东西.

I've looked at the random module which doesn't seem to have an appropriate function and at numpy.random which although it has a multinomial function doesn't seem to return the results in a nice form for this problem. I'm basically looking for something like mnrnd in matlab.

非常感谢.

非常感谢所有答案.为了澄清,我不是在寻找有关如何编写采样方案的解释,而是要指出一种简单的方法,该方法是从给定一组对象和权重的多项式分布中采样,或者被告知不存在这样的函数在标准库中,所以应该自己编写.

Thanks for all the answers so quickly. To clarify, I'm not looking for explanations of how to write a sampling scheme, but rather to be pointed to an easy way to sample from a multinomial distribution given a set of objects and weights, or to be told that no such function exists in a standard library and so one should write one's own.

推荐答案

import numpy

n = 1000
pairs = [(.3, 'a'), (.3, 'b'), (.4, 'c')]
probabilities = numpy.random.multinomial(n, zip(*pairs)[0])
result = zip(probabilities, zip(*pairs)[1])
# [(299, 'a'), (299, 'b'), (402, 'c')]
[x[0] * x[1] for x in result]
# ['aaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbb', 'cccccccccccccccccccc']

您想如何准确地接收结果?

How exactly would you like to receive the results?

这篇关于如何在python中进行类别的加权随机样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆