数据范围内每一行的相关系数和p值 [英] Correlation coefficient and p value for each row within a datafarme

查看:111
本文介绍了数据范围内每一行的相关系数和p值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个矩阵,如下所示

I have a matrix, looks like following,

foo = pd.DataFrame(
        [['ASP1',12.45,12.65,1.54,1.56],
        ['ASP2',4.5,1.4,0.03,1.987],
        ['ASP3',0.12,0.34,0.45,0.9],
        ['ASP4',0.65,0.789,0.01,0.876]],
        columns = ('Sam','C1','C2','B1','B2'))
foo
    Sam C1  C2  B1  B2
0   ASP1    12.45   12.650  1.54    1.560
1   ASP2    4.50    1.400   0.03    1.987
2   ASP3    0.12    0.340   0.45    0.900
3   ASP4    0.65    0.789   0.01    0.876

我想对Sam在C1..C2和B1..B2列之间的每一行进行相关性测试.最后,我的目标是如下所示的结果矩阵,

And I wanted to do correlation test for each row in Sam between the columns C1..C2 and B1..B2. And at the end, I am aiming a resulting matrix as follows,

foo_result = pd.DataFrame(
        [['C',0.76,0.06],
        ['B',0.34,0.10]],
        columns = ('Gen','Correlation_coefficent','P-value'))
foo_result

    Gene    Correlation_coefficent  P-value
0   C   0.76    0.060
1   B   0.34    0.100

任何建议或解决方案都很好. 谢谢

Any suggestions or solutions would be great. Thank you

推荐答案

这应该做到:

from scipy.stats import pearsonr

c_values = [column for column in foo.columns.tolist() if column.startswith('C')]
b_values = [column for column in foo.columns.tolist() if column.startswith('B')]

foo['Correlation_coefficent'], foo['P-value'] = zip(*foo.T.apply(lambda x: pearsonr(x[c_values], x[b_values])))
foo_result = foo[['Sam', 'Correlation_coefficent','P-value']]

输出:

    Sam  Correlation_coefficent  P-value
0  ASP1                     1.0      0.0
1  ASP2                    -1.0      0.0
2  ASP3                     1.0      0.0
3  ASP4                     1.0      0.0

产生这些结果的原因是变量的数量.希望您的原件至少有3个值.

Reason why you have these results is the number of variables. Hope your original has at least 3 values.

这篇关于数据范围内每一行的相关系数和p值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆