来自不同形状的NumPy数组集合的组合均值和标准差 [英] Combined mean and standard deviation from a collection of NumPy arrays of different shapes

查看:125
本文介绍了来自不同形状的NumPy数组集合的组合均值和标准差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的Numpy数组具有形状

Let's say I have Numpy arrays with shapes

(682, 89, 138)
(2668, 76, 89)
(491, 62, 48)

我应该如何计算所有三个数组的平均值和标准偏差?如果它们是相同的形状,则可以使用np.stack(),然后获取结果数组的均值和标准差.

How should I calculate the mean and standard deviation of all three arrays combined? If they were the same shapes, I could use np.stack() and then get the mean and std of the resulting array.

是否可以使用不同大小的尺寸来执行此操作?还是在获得均值和标准差之前必须重塑?

Is it possible to do this with different sized dimensions? Or would I have to reshape before getting the mean and std?

推荐答案

我们可以使用standard deviationmean的公式为所有输入数组计算这两个标量值,而无需级联/堆叠(这可能会特别昂贵在大型NumPy数组上).让我们逐步进行操作-均值然后是标准偏差,似乎我们可以在std计算中使用mean.

We could use the formula of standard deviation and mean to compute those two scalar values for all input arrays without concatenating/stacking (that could be costly specially on large NumPy arrays). Let's do it in steps - mean and then standard deviation, as it seems we could use mean in std computations.

获取组合平均值:

因此,我们将从均值/平均数开始.为此,我们将获得每个数组的总和标量.然后,获得总和,最后除以所有数组中的元素数.

So, we will start with the mean/averaging. For this, we would get the summation scalar for each array. Then, get the total summation and finally divide by the number of elements in all arrays.

获取组合的标准偏差值:

对于标准偏差,我们的公式为:

For standard deviation, we have the formula as :

因此,我们将使用从上一步获得的组合平均值,使用std公式获得平方微分,除以所有数组中元素的总数,然后应用平方根.

So, we will use the combined mean value obtained from previous step, use the std formula to get the squared differentiation, divide by the total number of elements across all arrays and then apply square root.

实施

假设输入数组是ab,我们将有一个解决方案,像这样-

Let's say the input arrays are a and b, we would have one solution, like so -

N = float(a.size + b.size)
mean_ = (a.sum() + b.sum())/N
std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)

运行示例以进行验证

In [266]: a = np.random.rand(3,4,2)
     ...: b = np.random.rand(2,5,3)
     ...: 

In [267]: N = float(a.size + b.size)
     ...: mean_ = (a.sum() + b.sum())/N
     ...: std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
     ...: 

In [268]: mean_
Out[268]: 0.47854757879348042

In [270]: std_
Out[270]: 0.27890341338373376

现在,要进行验证,让我们堆叠然后使用相关的ufunc-

Now, to verify, let's stack and then use relevant ufuncs -

In [271]: A = np.hstack((a.ravel(), b.ravel()))

In [273]: A.mean()
Out[273]: 0.47854757879348037

In [274]: A.std()
Out[274]: 0.27890341338373376


作为输入的数组列表

对于包含所有这些数组的列表,我们需要像这样遍历它们-

For a list holding all those arrays, we need to iterate through them, like so -

A = [a,b,c] # input list of arrays

N = float(sum([i.size for i in A]))
mean_ = sum([i.sum() for i in A])/N
std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)

样品运行-

In [301]: a = np.random.rand(3,4,2)
     ...: b = np.random.rand(2,5,3)
     ...: c = np.random.rand(7,4)
     ...: 

In [302]: A = [a,b,c] # input list of arrays
     ...: N = float(sum([i.size for i in A]))
     ...: mean_ = sum([i.sum() for i in A])/N
     ...: std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
     ...: print mean_, std_
     ...: 
0.47703535428 0.293308550786

In [303]: A = np.hstack((a.ravel(), b.ravel(), c.ravel()))
     ...: print A.mean(), A.std()
     ...: 
0.47703535428 0.293308550786

这篇关于来自不同形状的NumPy数组集合的组合均值和标准差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆