Python中稀疏矩阵的相关系数? [英] Correlation coefficients for sparse matrix in python?

查看:227
本文介绍了Python中稀疏矩阵的相关系数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道如何在python中从很大的稀疏矩阵中计算出一个相关矩阵吗?基本上,我正在寻找类似numpy.corrcoef的东西,该东西可以在稀疏的稀疏矩阵上工作.

Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will work on a scipy sparse matrix.

推荐答案

您可以像这样从协方差矩阵中直接计算出相关系数:

You can compute the correlation coefficients fairly straightforwardly from the covariance matrix like this:

import numpy as np
from scipy import sparse

def sparse_corrcoef(A, B=None):

    if B is not None:
        A = sparse.vstack((A, B), format='csr')

    A = A.astype(np.float64)
    n = A.shape[1]

    # Compute the covariance matrix
    rowsum = A.sum(1)
    centering = rowsum.dot(rowsum.T.conjugate()) / n
    C = (A.dot(A.T.conjugate()) - centering) / (n - 1)

    # The correlation coefficients are given by
    # C_{i,j} / sqrt(C_{i} * C_{j})
    d = np.diag(C)
    coeffs = C / np.sqrt(np.outer(d, d))

    return coeffs

检查它是否可以正常运行:

Check that it works OK:

# some smallish sparse random matrices
a = sparse.rand(100, 100000, density=0.1, format='csr')
b = sparse.rand(100, 100000, density=0.1, format='csr')

coeffs1 = sparse_corrcoef(a, b)
coeffs2 = np.corrcoef(a.todense(), b.todense())

print(np.allclose(coeffs1, coeffs2))
# True

被警告:

计算协方差矩阵C所需的内存量在很大程度上取决于A(和B,如果给定)的稀疏结构.例如,如果A是仅包含非零值的单个列的(m, n)矩阵,则C将是包含 all (n, n)矩阵非零值.如果n很大,那么就内存消耗而言,这可能是个坏消息.

Be warned:

The amount of memory required for computing the covariance matrix C will be heavily dependent on the sparsity structure of A (and B, if given). For example, if A is an (m, n) matrix containing just a single column of non-zero values then C will be an (n, n) matrix containing all non-zero values. If n is large then this could be very bad news in terms of memory consumption.

这篇关于Python中稀疏矩阵的相关系数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆