计算两个多维数组之间的相关系数 [英] Computing the correlation coefficient between two multi-dimensional arrays

查看:4178
本文介绍了计算两个多维数组之间的相关系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有有形状两个数组ñx深米乘牛逼。我想计算每一个可能的行对 N T 相关系数> M (从 N M ,分别)。

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).

什么是最快,最Python的方式做到这一点? (循环执行 N M 似乎我要不快不符合Python)。我期待答案涉及 numpy的和/或 SciPy的。现在我的数组是 numpy的 阵列 S,但我愿意将它们转换为不同的类型。

What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.

我期待我的输出为随形阵列 n×m的

I'm expecting my output to be an array with the shape N X M.

N.B。当我说相关系数,我指的是皮尔逊积矩相关系数

N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.

下面是一些注意事项:


  • numpy的函数相关成分要求输入数组是一维的。

  • numpy的函数 corrcoef 接受二维数组,但它们必须具有相同的形状。

  • scipy.stats 函数 pearsonr 要求输入数组是一维的。

  • The numpy function correlate requires input arrays to be one-dimensional.
  • The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
  • The scipy.stats function pearsonr requires input arrays to be one-dimensional.

推荐答案

相关性(默认有效的情况下)两个二维数组之间:

您可以简单地使用矩阵乘法 np.dot 像这样 -

You can simply use matrix-multiplication np.dot like so -

out = np.dot(arr_one,arr_two.T)

每对排组合之间

与违约相关性有效情况下(ROW1,ROW2)两个输入数组会对应乘法结果在每一个(ROW1,ROW2 )的位置。

Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.

逐行相关系数计算两个二维数组:

def corr2_coeff(A,B):
    # Rowwise mean of input arrays & subtract from input arrays themeselves
    A_mA = A - A.mean(1)[:,None]
    B_mB = B - B.mean(1)[:,None]

    # Sum of squares across rows
    ssA = (A_mA**2).sum(1);
    ssB = (B_mB**2).sum(1);

    # Finally get corr coeff
    return np.dot(A_mA,B_mB.T)/np.sqrt(np.dot(ssA[:,None],ssB[None]))

这是基于该解决方案来 如何申请corr2功能多维数组在MATLAB

This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB

标杆

本节运行时的性能与针对该方法比较 generate_correlation_map &安培;在 pearsonr 为基础的方法href=\"http://stackoverflow.com/a/30145770/3293881\">对方的回答。从(取功能 test_generate_correlation_map()而不会在它的结束值正确性验证code)。请注意,所提出的方法的定时还包括在开始检查,以检查是否在两个输入数组相等数量的列,如在其他答案也完成。运行时间列旁边。

This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.

案例#1:

In [106]: A = np.random.rand(1000,100)

In [107]: B = np.random.rand(1000,100)

In [108]: %timeit corr2_coeff(A,B)
100 loops, best of 3: 15 ms per loop

In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop

案例#2:

In [110]: A = np.random.rand(5000,100)

In [111]: B = np.random.rand(5000,100)

In [112]: %timeit corr2_coeff(A,B)
1 loops, best of 3: 368 ms per loop

In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop

案例#3:

In [114]: A = np.random.rand(10000,10)

In [115]: B = np.random.rand(10000,10)

In [116]: %timeit corr2_coeff(A,B)
1 loops, best of 3: 1.29 s per loop

In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop

另外糊涂 pearsonr基于办法似乎过于缓慢,但这里的运行时间对于一个小的数据大小 -

The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -

In [118]: A = np.random.rand(1000,100)

In [119]: B = np.random.rand(1000,100)

In [120]: %timeit corr2_coeff(A,B)
100 loops, best of 3: 15.3 ms per loop

In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop

In [122]: %timeit pearsonr_based(A,B)
1 loops, best of 3: 33 s per loop

这篇关于计算两个多维数组之间的相关系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆