带有“百分比"的Python随机选择 [英] Python Random choice with 'percentage'

查看:191
本文介绍了带有“百分比"的Python随机选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前言

看起来它是几个stackoverflow问题的重复,但是我的情况(可能)有点独特.

It looks like it is a duplicate of few stackoverflow question but my situation is (probably) slightly unique.

我的情况

我有字典. 字符串,而 value 整数.

I have a dictionary. The key is a string and the value is an integer.

我希望python脚本随机选择 N .

I want the python script to randomly choose N number of keys.

值是被选择的可能性.密钥的值越高,密钥被随机选择的机会就越高.

我的解决方案

因此,使用其他一些StackOverflow帖子和互联网的强大功能,我设法使用加权随机解决了该问题.

So using some other StackOverflow post and the power of the internet I managed to solve it using Weighted Random.

DICT_VAR= {'best':308281009, 'good':7066325, 'meh':26884, 'bad':71, 'terrible':16, 'never':0}

list_var = []
for i in DICT_VAR.keys():
    list_var.extend([i]*DICT_VAR[i])

print random.sample(list_var, 2) # get 2 random choice I suppose

问题(要解决的问题)

您可能会注意到,字典中的值可以非常大(可以无限大),也可以小至0(零是最小,没有负数) ).

As you may notice, the value in the dictionary can be incredibly big (It can be unlimitedly big) and it can also be as small as 0 (zero is smallest, there is no negative number).

运行此代码(使用大一些的代码)导致我的计算机死机并且没有响应,直到我对其进行硬重置.

Running this code (with a little bigger numbers) resulted in my computer to freeze and be unresponsive until I hard reset it.

我的问题

我应该如何处理这种情况?还有什么其他适合我的情况的随机选择方法,因为加权随机是当前情况下最糟糕的解决方案.

How should I deal with the situation? Is there any other way of randomly choosing that is suitable for my situation since Weighted Random is the worst possible solution to this current case.

推荐答案

在这里我将假定0的值表示永远不要选择该键,该键可以在示例中重复(在字典中是不相关的) ),并且在这种情况下,我们可以使用第三方模块numpy.这是在Python 3.6.4中经过测试的代码,但是我对其进行了修改,因此它应该在Python 2.7中运行,但是我不能那样进行测试.

I will assume here that a value of 0 means the key should never be chosen, the keys may be repeated in the sample (in the dictionary is irrelevant), and we may use a third-party module--numpy in this case. Here is code tested in Python 3.6.4 but I modified it so it should run in Python 2.7, but I can't test it that way.

DICT_VAR= {'best':308281009, 'good':7066325, 'meh':26884, 'bad':71,
           'terrible':16, 'never':0}

import numpy as np

keys, weights = zip(*DICT_VAR.items())
probs = np.array(weights, dtype=float) / float(sum(weights))
sample_np = np.random.choice(keys, 2, p=probs)
sample = [str(val) for val in sample_np]

然后sample将您的样本保存为键字符串列表.请注意,键'best'的权重远大于其他权重,因此样本几乎总是['best', 'best'].

Then sample holds your sample as a list of key strings. Note that your weight for key 'best' is so much larger than the other weights that your sample will almost always be ['best', 'best'].

解释我的代码:首先将字典的键(字符串)和值(权重)拆分为单独的列表.然后将权重更改为概率-权重越大表示概率越大,权重为零表示概率为零.然后使用numpy的choice函数以概率作为权重选择键的样本.结果是一个numpy数组,但是您似乎想要一个标准的Python列表,因此最后一行将键的示例转换为标准列表.

To explain my code: first split the dictionary's keys (strings) and values (weights) into separate lists. Then change the weights to probabilities--larger weights give larger probabilities, a zero weight gives a zero probability. Then use numpy's choice function to choose a sample of keys using the probabilities as weights. The result is a numpy array, but you seem to want a standard Python list, so the final line converts the sample of keys into a standard list.

当然,有一个相当短的例程可以用标准Python编写,因此我们可以避免使用numpy.但这很可能会更慢.

There is, of course, a fairly short routine that could be written in standard Python, so we could avoid the use of numpy. But it would most probably be slower.

您的例程运行缓慢的原因是它建立了一个很大的列表,每个键重复其值给出的次数,然后以均等概率选择了一个样本.使用样本数据,这意味着构建一个庞大的列表,该列表要比可用的RAM大得多,并且要花费很多时间. Numpy的选择例程可以直接处理非均匀随机分布,而无需建立另一个列表.

The reason your routine was slow is that it builds a large list, with each key repeated the number of times given by its value, then a sample is chosen with uniform probability. With your sample data, that means building a huge list, much larger than your available RAM, and that takes much time. Numpy's choice routine can handle a non-uniform random distribution directly, without building another list.

这篇关于带有“百分比"的Python随机选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆