矩阵的所有行对的相关系数和 p 值 [英] Correlation coefficients and p values for all pairs of rows of a matrix

查看:30
本文介绍了矩阵的所有行对的相关系数和 p 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 m 行和 n 列的矩阵 data.我曾经使用 np.corrcoef:

I have a matrix data with m rows and n columns. I used to compute the correlation coefficients between all pairs of rows using np.corrcoef:

import numpy as np
data = np.array([[0, 1, -1], [0, -1, 1]])
np.corrcoef(data)

现在我还想看看这些系数的 p 值.np.corrcoef 不提供这些;scipy.stats.pearsonr 确实如此.但是,scipy.stats.pearsonr 不接受输入矩阵.

Now I would also like to have a look at the p-values of these coefficients. np.corrcoef doesn't provide these; scipy.stats.pearsonr does. However, scipy.stats.pearsonr does not accept a matrix on input.

是否有一种快速的方法可以计算所有行对的系数和 p 值(例如到达两个 m × m 矩阵,其中一个具有相关系数,另一个具有相应的 p 值)而不必手动遍历所有对?

Is there a quick way how to compute both the coefficient and the p-value for all pairs of rows (arriving e.g. at two m by m matrices, one with correlation coefficients, the other with corresponding p-values) without having to manually go through all pairs?

推荐答案

我今天遇到了同样的问题.

I have encountered the same problem today.

在谷歌搜索半小时后,我在 numpy/scipy 库中找不到任何代码可以帮助我做到这一点.

After half an hour of googling, I can't find any code in numpy/scipy library can help me do this.

所以我写了自己的corrcoef

import numpy as np
from scipy.stats import pearsonr, betai

def corrcoef(matrix):
    r = np.corrcoef(matrix)
    rf = r[np.triu_indices(r.shape[0], 1)]
    df = matrix.shape[1] - 2
    ts = rf * rf * (df / (1 - rf * rf))
    pf = betai(0.5 * df, 0.5, df / (df + ts))
    p = np.zeros(shape=r.shape)
    p[np.triu_indices(p.shape[0], 1)] = pf
    p[np.tril_indices(p.shape[0], -1)] = p.T[np.tril_indices(p.shape[0], -1)]
    p[np.diag_indices(p.shape[0])] = np.ones(p.shape[0])
    return r, p

def corrcoef_loop(matrix):
    rows, cols = matrix.shape[0], matrix.shape[1]
    r = np.ones(shape=(rows, rows))
    p = np.ones(shape=(rows, rows))
    for i in range(rows):
        for j in range(i+1, rows):
            r_, p_ = pearsonr(matrix[i], matrix[j])
            r[i, j] = r[j, i] = r_
            p[i, j] = p[j, i] = p_
    return r, p

第一个版本使用 np.corrcoef 的结果,然后根据 corrcoef 矩阵的三角形上值计算 p 值.

The first version use the result of np.corrcoef, and then calculate p-value based on triangle-upper values of corrcoef matrix.

第二个循环版本只是遍历行,手动执行 pearsonr.

The second loop version just iterating over rows, do pearsonr manually.

def test_corrcoef():
    a = np.array([
        [1, 2, 3, 4],
        [1, 3, 1, 4],
        [8, 3, 8, 5],
        [2, 3, 2, 1]])

    r1, p1 = corrcoef(a)
    r2, p2 = corrcoef_loop(a)

    assert np.allclose(r1, r2)
    assert np.allclose(p1, p2)

测试通过,他们是一样的.

The test passed, they are the same.

def test_timing():
    import time
    a = np.random.randn(100, 2500)

    def timing(func, *args, **kwargs):
        t0 = time.time()
        loops = 10
        for _ in range(loops):
            func(*args, **kwargs)
        print('{} takes {} seconds loops={}'.format(
            func.__name__, time.time() - t0, loops))

    timing(corrcoef, a)
    timing(corrcoef_loop, a)


if __name__ == '__main__':
    test_corrcoef()
    test_timing()

我的 Macbook 在 100x2500 矩阵上的表现

The performance on my Macbook against 100x2500 matrix

corrcoef 需要 0.06608104705810547 秒 loops=10

corrcoef takes 0.06608104705810547 seconds loops=10

corrcoef_loop 需要 7.585600137710571 秒 loops=10

corrcoef_loop takes 7.585600137710571 seconds loops=10

这篇关于矩阵的所有行对的相关系数和 p 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆