计算两个多维数组之间的相关系数 [英] Computing the correlation coefficient between two multi-dimensional arrays
问题描述
我有两个形状为 N X T
和 M X T
的数组.我想计算 T
每对可能的行 n
和 m
之间的相关系数(来自 N
> 和 M
,分别).
I have two arrays that have the shapes N X T
and M X T
. I'd like to compute the correlation coefficient across T
between every possible pair of rows n
and m
(from N
and M
, respectively).
执行此操作的最快、最 Pythonic 的方法是什么?(循环 N
和 M
在我看来既不快也不pythonic.)我期待答案涉及 numpy
和/或 scipy
.现在我的数组是 numpy
array
s,但我愿意将它们转换为不同的类型.
What's the fastest, most pythonic way to do this? (Looping over N
and M
would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy
and/or scipy
. Right now my arrays are numpy
array
s, but I'm open to converting them to a different type.
我希望我的输出是一个形状为 N X M
的数组.
I'm expecting my output to be an array with the shape N X M
.
注意当我说相关系数"时,我指的是 Pearson 积矩相关系数.
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
需要注意以下几点:
numpy
函数correlate
要求输入数组是一维的.numpy
函数corrcoef
接受二维数组,但它们必须具有相同的形状.scipy.stats
函数pearsonr
要求输入数组是一维的.
- The
numpy
functioncorrelate
requires input arrays to be one-dimensional. - The
numpy
functioncorrcoef
accepts two-dimensional arrays, but they must have the same shape. - The
scipy.stats
functionpearsonr
requires input arrays to be one-dimensional.
推荐答案
两个二维数组之间的相关性(默认为有效"大小写):
您可以简单地使用矩阵乘法np.点
就像这样 -
You can simply use matrix-multiplication np.dot
like so -
out = np.dot(arr_one,arr_two.T)
与两个输入数组的每个成对行组合 (row1,row2) 之间的默认 "valid"
情况的相关性将对应于每个 (row1,row2) 位置的乘法结果.
Correlation with the default "valid"
case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
两个二维数组的行相关系数计算:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))
这是基于此解决方案如何在 MATLAB 中的多维数组中应用 corr2 函数
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
基准测试
本节将运行时性能与针对 generate_correlation_map
的建议方法进行比较 &其他答案中列出的基于循环pearsonr
的方法.(取自函数test_generate_correlation_map()
末尾没有值正确性验证码).请注意,所提议方法的时间安排还包括在开始时检查两个输入数组中的列数是否相同,正如其他答案中所做的那样.接下来列出了运行时.
This section compares runtime performance with the proposed approach against generate_correlation_map
& loopy pearsonr
based approach listed in the other answer.(taken from the function test_generate_correlation_map()
without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
案例#1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
案例#2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
案例#3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
另一种循环基于pearsonr的
方法似乎太慢了,但这里是一个小数据大小的运行时-
The other loopy pearsonr based
approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop
这篇关于计算两个多维数组之间的相关系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!