计算两个多维数组之间的相关系数 [英] Computing the correlation coefficient between two multi-dimensional arrays
问题描述
我有有形状两个数组ñx深
和米乘牛逼
。我想计算每一个可能的行对 N
和 T
相关系数> M (从 N
和 M
,分别)。
I have two arrays that have the shapes N X T
and M X T
. I'd like to compute the correlation coefficient across T
between every possible pair of rows n
and m
(from N
and M
, respectively).
什么是最快,最Python的方式做到这一点? (循环执行 N
和 M
似乎我要不快不符合Python)。我期待答案涉及 numpy的
和/或 SciPy的
。现在我的数组是 numpy的
阵列
S,但我愿意将它们转换为不同的类型。
What's the fastest, most pythonic way to do this? (Looping over N
and M
would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy
and/or scipy
. Right now my arrays are numpy
array
s, but I'm open to converting them to a different type.
我期待我的输出为随形阵列 n×m的
。
I'm expecting my output to be an array with the shape N X M
.
N.B。当我说相关系数,我指的是皮尔逊积矩相关系数。
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
下面是一些注意事项:
- 的
numpy的
函数相关成分
要求输入数组是一维的。 - 的
numpy的
函数corrcoef
接受二维数组,但它们必须具有相同的形状。 - 的
scipy.stats
函数pearsonr
要求输入数组是一维的。
- The
numpy
functioncorrelate
requires input arrays to be one-dimensional. - The
numpy
functioncorrcoef
accepts two-dimensional arrays, but they must have the same shape. - The
scipy.stats
functionpearsonr
requires input arrays to be one-dimensional.
推荐答案
相关性(默认有效的情况下)两个二维数组之间:
您可以简单地使用矩阵乘法 np.dot
像这样 -
You can simply use matrix-multiplication np.dot
like so -
out = np.dot(arr_one,arr_two.T)
每对排组合之间
与违约相关性有效
情况下(ROW1,ROW2)两个输入数组会对应乘法结果在每一个(ROW1,ROW2 )的位置。
Correlation with the default "valid"
case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
逐行相关系数计算两个二维数组:
def corr2_coeff(A,B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:,None]
B_mB = B - B.mean(1)[:,None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1);
ssB = (B_mB**2).sum(1);
# Finally get corr coeff
return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))
这是基于该解决方案来 如何申请corr2功能多维数组在MATLAB
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
标杆
本节运行时的性能与针对该方法比较 generate_correlation_map
&安培;在 pearsonr 为基础的方法href=\"http://stackoverflow.com/a/30145770/3293881\">对方的回答。从(取功能 test_generate_correlation_map()
而不会在它的结束值正确性验证code)。请注意,所提出的方法的定时还包括在开始检查,以检查是否在两个输入数组相等数量的列,如在其他答案也完成。运行时间列旁边。
This section compares runtime performance with the proposed approach against generate_correlation_map
& loopy pearsonr
based approach listed in the other answer.(taken from the function test_generate_correlation_map()
without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
案例#1:
In [106]: A = np.random.rand(1000,100)
In [107]: B = np.random.rand(1000,100)
In [108]: %timeit corr2_coeff(A,B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
案例#2:
In [110]: A = np.random.rand(5000,100)
In [111]: B = np.random.rand(5000,100)
In [112]: %timeit corr2_coeff(A,B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
案例#3:
In [114]: A = np.random.rand(10000,10)
In [115]: B = np.random.rand(10000,10)
In [116]: %timeit corr2_coeff(A,B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
另外糊涂 pearsonr基于
办法似乎过于缓慢,但这里的运行时间对于一个小的数据大小 -
The other loopy pearsonr based
approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000,100)
In [119]: B = np.random.rand(1000,100)
In [120]: %timeit corr2_coeff(A,B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A,B)
1 loops, best of 3: 33 s per loop
这篇关于计算两个多维数组之间的相关系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!