如何计算Pandas中列的成对相关的p值？ [英] How to calculate p-values for pairwise correlation of columns in Pandas?

查看：774 发布时间：2020/10/10 1:32:05 python pandas dataframe correlation

本文介绍了如何计算Pandas中列的成对相关的p值？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Pandas具有非常方便的功能，可以使用 pd.corr（）。
这意味着可以比较任何长度的列之间的相关性。例如：

Pandas has the very handy function to do pairwise correlation of columns using pd.corr(). That means it is possible to compare correlations between columns of any length. For instance:

df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)))

     0   1   2   3   4   5   6   7   8   9
0    9  17  55  32   7  97  61  47  48  46
1    8  83  87  56  17  96  81   8  87   0
2   60  29   8  68  56  63  81   5  24  52
3   42  76   6  75   7  59  19  17   3  63
...

现在可以使用 df.corr（method ='pearson'）来测试所有10列之间的相关性：

Now it is possible to test correlation between all 10 columns with df.corr(method='pearson'):

      0         1         2         3         4         5         6         7         8         9
0  1.000000  0.082789 -0.094096 -0.086091  0.163091  0.013210  0.167204 -0.002514  0.097481  0.091020
1  0.082789  1.000000  0.027158 -0.080073  0.056364 -0.050978 -0.018428 -0.014099 -0.135125 -0.043797
2 -0.094096  0.027158  1.000000 -0.102975  0.101597 -0.036270  0.202929  0.085181  0.093723 -0.055824
3 -0.086091 -0.080073 -0.102975  1.000000 -0.149465  0.033130 -0.020929  0.183301 -0.003853 -0.062889
4  0.163091  0.056364  0.101597 -0.149465  1.000000 -0.007567 -0.017212 -0.086300  0.177247 -0.008612
5  0.013210 -0.050978 -0.036270  0.033130 -0.007567  1.000000 -0.080148 -0.080915 -0.004612  0.243713
6  0.167204 -0.018428  0.202929 -0.020929 -0.017212 -0.080148  1.000000  0.135348  0.070330  0.008170
7 -0.002514 -0.014099  0.085181  0.183301 -0.086300 -0.080915  0.135348  1.000000 -0.114413 -0.111642
8  0.097481 -0.135125  0.093723 -0.003853  0.177247 -0.004612  0.070330 -0.114413  1.000000 -0.153564
9  0.091020 -0.043797 -0.055824 -0.062889 -0.008612  0.243713  0.008170 -0.111642 -0.153564  1.000000

是否有一种简单的方法也可以获取相应的p值（理想情况下是熊猫），因为它返回了例如由scipy的 kendalltau（）？

Is there a simple way to also get the corresponding p-values (ideally in pandas), as it is returned e.g. by scipy's kendalltau()?

推荐答案

可能只是循环。基本上，熊猫在源代码中所做的就是生成相关矩阵：

Probably just loop. It's basically what pandas does in the source code to generate the correlation matrix anyway:

import pandas as pd
import numpy as np
from scipy import stats

df_corr = pd.DataFrame() # Correlation matrix
df_p = pd.DataFrame()  # Matrix of p-values
for x in df.columns:
    for y in df.columns:
        corr = stats.pearsonr(df[x], df[y])
        df_corr.loc[x,y] = corr[0]
        df_p.loc[x,y] = corr[1]

如果您想利用对称的事实，那么只需要对其中一半进行计算，就可以做到：

If you want to leverage the fact that this is symmetric, so you only need to calculate this for roughly half of them, then do:

mat = df.values.T
K = len(df.columns)
correl = np.empty((K,K), dtype=float)
p_vals = np.empty((K,K), dtype=float)

for i, ac in enumerate(mat):
    for j, bc in enumerate(mat):
        if i > j:
            continue
        else:
            corr = stats.pearsonr(ac, bc)
            #corr = stats.kendalltau(ac, bc)

        correl[i,j] = corr[0]
        correl[j,i] = corr[0]
        p_vals[i,j] = corr[1]
        p_vals[j,i] = corr[1]

df_p = pd.DataFrame(p_vals)
df_corr = pd.DataFrame(correl)
#pd.concat([df_corr, df_p], keys=['corr', 'p_val'])

这篇关于如何计算Pandas中列的成对相关的p值？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算Pandas中列的成对相关的p值？ [英] How to calculate p-values for pairwise correlation of columns in Pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何计算Pandas中列的成对相关的p值？ [英] How to calculate p-values for pairwise correlation of columns in Pandas?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭