从骨灰盒 Numpy 绘图 [英] Numpy drawing from urn
问题描述
我想在numpy中运行一个相对简单的随机抽奖,但我找不到一个好的表达方式.我认为最好的方法是将其描述为从骨灰盒中提取而无需更换.我有一个有 k 种颜色的骨灰盒,以及每种颜色的 n_k 个球.我想画m个球,知道我有多少种颜色的球.
我目前的尝试
np.bincount(np.random.permutation(np.repeat(np.arange(k), n_k))[:m], minlength=k)
这里,n_k
是一个长度为 k 的数组,包含球的数量.
似乎相当于np.bincount(np.random.choice(k, m, n_k/n_k.sum(), minlength=k)
稍微好一点,但仍然不是很好.
你想要的是 多元超几何分布.我不知道 numpy 或 scipy 中有一个,但它可能已经存在于某处.
我为 numpy 1.18.0 贡献了一个多元超几何分布的实现;请参阅numpy.random.Generator.multivariate_hypergeometric
.
例如,从一个包含 12 个红色、4 个绿色和 18 个蓝色弹珠的瓮中抽取 15 个样本,并重复该过程 10 次:
In [4]: import numpy as np在 [5] 中:rng = np.random.default_rng()在 [6] 中:颜色 = [12, 4, 18]在 [7]: rng.multivariate_hypergeometric(colors, 15, size=10)出[7]:数组([[ 5, 4, 6],[ 3, 3, 9],[ 6, 2, 7],[ 7, 2, 6],[ 3, 0, 12],[ 5, 2, 8],[ 6, 2, 7],[ 7, 1, 7],[ 8, 1, 6],[ 6, 1, 8]])
这个答案的其余部分现在已经过时了,但我会留给后代(不管这意味着什么......).
您可以通过重复调用<代码>numpy.random.hypergeometric.这是否会比您的实现更有效取决于有多少颜色以及每种颜色有多少个球.
例如,下面的脚本打印了从包含三种颜色(红色、绿色和蓝色)的骨灰盒中绘制的结果:
from __future__ import print_function将 numpy 导入为 npnred = 12绿色 = 4蓝 = 18米 = 15红色 = np.random.hypergeometric(nred, ngreen + nblue, m)绿色 = np.random.hypergeometric(ngreen, nblue, m - red)蓝色 = m -(红色 + 绿色)打印(红色:%2i"% 红色)打印(绿色:%2i"% 绿色)打印(蓝色:%2i"% 蓝色)
示例输出:
红色:6绿色:1蓝色:8
下面的函数概括了给定一个数组 colors
来选择 m
个球:
def 示例(m,颜色):"参数----------m : 从骨灰盒中取出的数字球颜色 : 瓮中每种颜色的数字球的一维数组退货-------一维数组,长度与 `colors` 相同,包含随机样本中每种颜色的球数."剩余 = np.cumsum(颜色[::-1])[::-1]结果 = np.zeros(len(colors), dtype=np.int)对于我在范围内(len(颜色)-1):如果 m <1:休息结果[i] = np.random.hypergeometric(颜色[i],剩余[i+1],米)m -= 结果[i]结果[-1] = m返回结果
例如
<预><代码>>>>样本(10, [2, 4, 8, 16])数组([2, 3, 1, 4])I want to run a relatively simple random draw in numpy, but I can't find a good way to express it. I think the best way is to describe it as drawing from an urn without replacement. I have an urn with k colors, and n_k balls of every color. I want to draw m balls, and know how many balls of every color I have.
My current attempt it
np.bincount(np.random.permutation(np.repeat(np.arange(k), n_k))[:m], minlength=k)
here, n_k
is an array of length k with the counts of the balls.
It seems that's equivalent to
np.bincount(np.random.choice(k, m, n_k / n_k.sum(), minlength=k)
which is a bit better, but still not great.
What you want is an implementation of the multivariate hypergeometric distribution.
I don't know of one in numpy or scipy, but it might already exist out there somewhere.
I contributed an implementation of the multivariate hypergeometric distribution to numpy 1.18.0; see numpy.random.Generator.multivariate_hypergeometric
.
For example, to draw 15 samples from an urn containing 12 red, 4 green and 18 blue marbles, and repeat the process 10 times:
In [4]: import numpy as np
In [5]: rng = np.random.default_rng()
In [6]: colors = [12, 4, 18]
In [7]: rng.multivariate_hypergeometric(colors, 15, size=10)
Out[7]:
array([[ 5, 4, 6],
[ 3, 3, 9],
[ 6, 2, 7],
[ 7, 2, 6],
[ 3, 0, 12],
[ 5, 2, 8],
[ 6, 2, 7],
[ 7, 1, 7],
[ 8, 1, 6],
[ 6, 1, 8]])
The rest of this answer is now obsolete, but I'll leave for posterity (whatever that means...).
You can implement it using repeated calls to numpy.random.hypergeometric
. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.
For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):
from __future__ import print_function
import numpy as np
nred = 12
ngreen = 4
nblue = 18
m = 15
red = np.random.hypergeometric(nred, ngreen + nblue, m)
green = np.random.hypergeometric(ngreen, nblue, m - red)
blue = m - (red + green)
print("red: %2i" % red)
print("green: %2i" % green)
print("blue: %2i" % blue)
Sample output:
red: 6
green: 1
blue: 8
The following function generalizes that to choosing m
balls given an array colors
holding the number of each color:
def sample(m, colors):
"""
Parameters
----------
m : number balls to draw from the urn
colors : one-dimensional array of number balls of each color in the urn
Returns
-------
One-dimensional array with the same length as `colors` containing the
number of balls of each color in a random sample.
"""
remaining = np.cumsum(colors[::-1])[::-1]
result = np.zeros(len(colors), dtype=np.int)
for i in range(len(colors)-1):
if m < 1:
break
result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
m -= result[i]
result[-1] = m
return result
For example,
>>> sample(10, [2, 4, 8, 16])
array([2, 3, 1, 4])
这篇关于从骨灰盒 Numpy 绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!