计算两个多维数组之间的相关系数 [英] Computing the correlation coefficient between two multi-dimensional arrays

查看:60
本文介绍了计算两个多维数组之间的相关系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个形状为 N X TM X T 的数组.我想计算 T 每对可能的行 nm 之间的相关系数(来自 N> 和 M,分别).

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).

执行此操作的最快、最 Pythonic 的方法是什么?(循环 NM 在我看来既不快也不pythonic.)我期待答案涉及 numpy 和/或 scipy.现在我的数组是 numpy arrays,但我愿意将它们转换为不同的类型.

What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.

我希望我的输出是一个形状为 N X M 的数组.

I'm expecting my output to be an array with the shape N X M.

注意当我说相关系数"时,我指的是 Pearson 积矩相关系数.

N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.

需要注意以下几点:

  • numpy 函数 correlate 要求输入数组是一维的.
  • numpy 函数 corrcoef 接受二维数组,但它们必须具有相同的形状.
  • scipy.stats 函数 pearsonr 要求输入数组是一维的.
  • The numpy function correlate requires input arrays to be one-dimensional.
  • The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
  • The scipy.stats function pearsonr requires input arrays to be one-dimensional.

推荐答案

两个二维数组之间的相关性(默认为有效"大小写):

您可以简单地使用矩阵乘法np.点就像这样 -

You can simply use matrix-multiplication np.dot like so -

out = np.dot(arr_one,arr_two.T)

与两个输入数组的每个成对行组合 (row1,row2) 之间的默认 "valid" 情况的相关性将对应于每个 (row1,row2) 位置的乘法结果.

Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.

两个二维数组的行相关系数计算:

def corr2_coeff(A, B):
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(1)[:, None]
    B_mB = B - B.mean(1)[:, None]

    # Sum of squares across rows
    ssA = (A_mA**2).sum(1)
    ssB = (B_mB**2).sum(1)

    # Finally get corr coeff
    return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))

这是基于此解决方案如何在 MATLAB 中的多维数组中应用 corr2 函数

This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB

基准测试

本节将运行时性能与针对 generate_correlation_map 的建议方法进行比较 &其他答案中列出的基于循环pearsonr的方法.(取自函数test_generate_correlation_map() 末尾没有值正确性验证码).请注意,所提议方法的时间安排还包括在开始时检查两个输入数组中的列数是否相同,正如其他答案中所做的那样.接下来列出了运行时.

This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.

案例#1:

In [106]: A = np.random.rand(1000, 100)

In [107]: B = np.random.rand(1000, 100)

In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop

In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop

案例#2:

In [110]: A = np.random.rand(5000, 100)

In [111]: B = np.random.rand(5000, 100)

In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop

In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop

案例#3:

In [114]: A = np.random.rand(10000, 10)

In [115]: B = np.random.rand(10000, 10)

In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop

In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop

另一种循环基于pearsonr的方法似乎太慢了,但这里是一个小数据大小的运行时-

The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -

In [118]: A = np.random.rand(1000, 100)

In [119]: B = np.random.rand(1000, 100)

In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop

In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop

In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop

这篇关于计算两个多维数组之间的相关系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆