来自不同形状的NumPy数组集合的组合均值和标准差 [英] Combined mean and standard deviation from a collection of NumPy arrays of different shapes
问题描述
假设我的Numpy数组具有形状
Let's say I have Numpy arrays with shapes
(682, 89, 138)
(2668, 76, 89)
(491, 62, 48)
我应该如何计算所有三个数组的平均值和标准偏差?如果它们是相同的形状,则可以使用np.stack()
,然后获取结果数组的均值和标准差.
How should I calculate the mean and standard deviation of all three arrays combined? If they were the same shapes, I could use np.stack()
and then get the mean and std of the resulting array.
是否可以使用不同大小的尺寸来执行此操作?还是在获得均值和标准差之前必须重塑?
Is it possible to do this with different sized dimensions? Or would I have to reshape before getting the mean and std?
推荐答案
我们可以使用standard deviation
和mean
的公式为所有输入数组计算这两个标量值,而无需级联/堆叠(这可能会特别昂贵在大型NumPy数组上).让我们逐步进行操作-均值然后是标准偏差,似乎我们可以在std
计算中使用mean
.
We could use the formula of standard deviation
and mean
to compute those two scalar values for all input arrays without concatenating/stacking (that could be costly specially on large NumPy arrays). Let's do it in steps - mean and then standard deviation, as it seems we could use mean
in std
computations.
获取组合平均值:
因此,我们将从均值/平均数开始.为此,我们将获得每个数组的总和标量.然后,获得总和,最后除以所有数组中的元素数.
So, we will start with the mean/averaging. For this, we would get the summation scalar for each array. Then, get the total summation and finally divide by the number of elements in all arrays.
获取组合的标准偏差值:
对于标准偏差,我们的公式为:
For standard deviation, we have the formula as :
因此,我们将使用从上一步获得的组合平均值,使用std
公式获得平方微分,除以所有数组中元素的总数,然后应用平方根.
So, we will use the combined mean value obtained from previous step, use the std
formula to get the squared differentiation, divide by the total number of elements across all arrays and then apply square root.
实施
假设输入数组是a
和b
,我们将有一个解决方案,像这样-
Let's say the input arrays are a
and b
, we would have one solution, like so -
N = float(a.size + b.size)
mean_ = (a.sum() + b.sum())/N
std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
运行示例以进行验证
In [266]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...:
In [267]: N = float(a.size + b.size)
...: mean_ = (a.sum() + b.sum())/N
...: std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
...:
In [268]: mean_
Out[268]: 0.47854757879348042
In [270]: std_
Out[270]: 0.27890341338373376
现在,要进行验证,让我们堆叠然后使用相关的ufunc-
Now, to verify, let's stack and then use relevant ufuncs -
In [271]: A = np.hstack((a.ravel(), b.ravel()))
In [273]: A.mean()
Out[273]: 0.47854757879348037
In [274]: A.std()
Out[274]: 0.27890341338373376
作为输入的数组列表
对于包含所有这些数组的列表,我们需要像这样遍历它们-
For a list holding all those arrays, we need to iterate through them, like so -
A = [a,b,c] # input list of arrays
N = float(sum([i.size for i in A]))
mean_ = sum([i.sum() for i in A])/N
std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
样品运行-
In [301]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...: c = np.random.rand(7,4)
...:
In [302]: A = [a,b,c] # input list of arrays
...: N = float(sum([i.size for i in A]))
...: mean_ = sum([i.sum() for i in A])/N
...: std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
...: print mean_, std_
...:
0.47703535428 0.293308550786
In [303]: A = np.hstack((a.ravel(), b.ravel(), c.ravel()))
...: print A.mean(), A.std()
...:
0.47703535428 0.293308550786
这篇关于来自不同形状的NumPy数组集合的组合均值和标准差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!