对沿轴的给定2D概率数组矢量化``numpy.random.choice'' [英] Vectorizing `numpy.random.choice` for given 2D array of probabilities along an axis
本文介绍了对沿轴的给定2D概率数组矢量化``numpy.random.choice''的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Numpy具有random.choice
功能,可让您从分类分布中进行采样.您将如何在轴上重复此操作?为了说明我的意思,这是我当前的代码:
Numpy has the random.choice
function, which allows you to sample from a categorical distribution. How would you repeat this over an axis? To illustrate what I mean, here is my current code:
categorical_distributions = np.array([
[.1, .3, .6],
[.2, .4, .4],
])
_, n = categorical_distributions.shape
np.array([np.random.choice(n, p=row)
for row in categorical_distributions])
理想情况下,我想消除for循环.
Ideally, I would like to eliminate the for loop.
推荐答案
这是一种获取每行随机索引的矢量化方法,其中a
作为概率的2D
数组-
Here's one vectorized way to get the random indices per row, with a
as the 2D
array of probabilities -
(a.cumsum(1) > np.random.rand(a.shape[0])[:,None]).argmax(1)
泛化以覆盖2D
数组的行和列-
Generalizing to cover both along the rows and columns for 2D
array -
def random_choice_prob_index(a, axis=1):
r = np.expand_dims(np.random.rand(a.shape[1-axis]), axis=axis)
return (a.cumsum(axis=axis) > r).argmax(axis=axis)
让我们通过运行一百万次来验证给定的样本-
Let's verify with the given sample by running it over a million times -
In [589]: a = np.array([
...: [.1, .3, .6],
...: [.2, .4, .4],
...: ])
In [590]: choices = [random_choice_prob_index(a)[0] for i in range(1000000)]
# This should be close to first row of given sample
In [591]: np.bincount(choices)/float(len(choices))
Out[591]: array([ 0.099781, 0.299436, 0.600783])
运行时测试
原始循环方式-
def loopy_app(categorical_distributions):
m, n = categorical_distributions.shape
out = np.empty(m, dtype=int)
for i,row in enumerate(categorical_distributions):
out[i] = np.random.choice(n, p=row)
return out
在更大的数组上计时-
In [593]: a = np.array([
...: [.1, .3, .6],
...: [.2, .4, .4],
...: ])
In [594]: a_big = np.repeat(a,100000,axis=0)
In [595]: %timeit loopy_app(a_big)
1 loop, best of 3: 2.54 s per loop
In [596]: %timeit random_choice_prob_index(a_big)
100 loops, best of 3: 6.44 ms per loop
这篇关于对沿轴的给定2D概率数组矢量化``numpy.random.choice''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文