pandas 关联小组 [英] Pandas Correlation Groupby

查看:125
本文介绍了 pandas 关联小组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个类似于下面的数据框,我将如何获得2个特定列之间的关联,然后按'ID'列进行分组?我相信熊猫的'corr'方法找到所有列之间的相关性。如果可能的话,我还想知道如何使用.agg函数(即np.correlate)找到'groupby'相关性。



我有什么: p>

  ID Val1 Val2其他数据其他数据
A 5 4 xx
A 4 5 xx
A 6 6 xx
B 4 1 xx
B 8 2 xx
B 7 9 xx
C 4 8 xx
C 5 5 xx
C 2 1 xx

我需要:

  ID Correlation_Val1_Val2 
A 0.12
B 0.22
C 0.05

Thanks!

解决方案

你几乎可以想出所有的东西,只需要将它们合并:

 在[441]:df.groupby('ID')[['Val1','Val2']]。corr()
出[441]:
Val1 Val2
ID
A Val1 1.000000 0.500000
Val2 0.500000 1.000000
B Val1 1.00 0000 0.385727
Val2 0.385727 1.000000

在你的情况下,为每个ID输出2x2是过分冗长。我没有看到打印标量关联而不是整个矩阵的选项,但是您可以执行如下操作:

  In [442]:df.groupby('ID')[['Val1','Val2']]。corr()。ix [0 :: 2,'Val2'] 
Out [442]:
ID
A Val1 0.500000
B Val1 0.385727

然后重命名并按照您的喜好储存商品。


Assuming I have a dataframe similar to the below, how would I get the correlation between 2 specific columns and then group by the 'ID' column? I believe the Pandas 'corr' method finds the correlation between all columns. If possible I would also like to know how I could find the 'groupby' correlation using the .agg function (i.e. np.correlate).

What I have:

ID  Val1    Val2    OtherData   OtherData
A   5   4   x   x
A   4   5   x   x
A   6   6   x   x
B   4   1   x   x
B   8   2   x   x
B   7   9   x   x
C   4   8   x   x
C   5   5   x   x
C   2   1   x   x

What I need:

ID  Correlation_Val1_Val2
A   0.12
B   0.22
C   0.05

Thanks!

解决方案

You pretty much figured out all the pieces, just need to combine them:

In [441]: df.groupby('ID')[['Val1','Val2']].corr()
Out[441]: 
             Val1      Val2
ID                         
A  Val1  1.000000  0.500000
   Val2  0.500000  1.000000
B  Val1  1.000000  0.385727
   Val2  0.385727  1.000000

In your case, printing out a 2x2 for each ID is excessively verbose. I don't see an option to print a scalar correlation instead of the whole matrix, but you can do something like:

In [442]:df.groupby('ID')[['Val1','Val2']].corr().ix[0::2,'Val2']
Out[442]: 
ID       
A   Val1    0.500000
B   Val1    0.385727

And then rename and store things as you like.

这篇关于 pandas 关联小组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆