pandas 关联小组 [英] Pandas Correlation Groupby
问题描述
我有什么: p>
ID Val1 Val2其他数据其他数据
A 5 4 xx
A 4 5 xx
A 6 6 xx
B 4 1 xx
B 8 2 xx
B 7 9 xx
C 4 8 xx
C 5 5 xx
C 2 1 xx
我需要:
ID Correlation_Val1_Val2
A 0.12
B 0.22
C 0.05
Thanks!
你几乎可以想出所有的东西,只需要将它们合并:
在[441]:df.groupby('ID')[['Val1','Val2']]。corr()
出[441]:
Val1 Val2
ID
A Val1 1.000000 0.500000
Val2 0.500000 1.000000
B Val1 1.00 0000 0.385727
Val2 0.385727 1.000000
在你的情况下,为每个ID输出2x2是过分冗长。我没有看到打印标量关联而不是整个矩阵的选项,但是您可以执行如下操作:
In [442]:df.groupby('ID')[['Val1','Val2']]。corr()。ix [0 :: 2,'Val2']
Out [442]:
ID
A Val1 0.500000
B Val1 0.385727
然后重命名并按照您的喜好储存商品。
Assuming I have a dataframe similar to the below, how would I get the correlation between 2 specific columns and then group by the 'ID' column? I believe the Pandas 'corr' method finds the correlation between all columns. If possible I would also like to know how I could find the 'groupby' correlation using the .agg function (i.e. np.correlate).
What I have:
ID Val1 Val2 OtherData OtherData
A 5 4 x x
A 4 5 x x
A 6 6 x x
B 4 1 x x
B 8 2 x x
B 7 9 x x
C 4 8 x x
C 5 5 x x
C 2 1 x x
What I need:
ID Correlation_Val1_Val2
A 0.12
B 0.22
C 0.05
Thanks!
You pretty much figured out all the pieces, just need to combine them:
In [441]: df.groupby('ID')[['Val1','Val2']].corr()
Out[441]:
Val1 Val2
ID
A Val1 1.000000 0.500000
Val2 0.500000 1.000000
B Val1 1.000000 0.385727
Val2 0.385727 1.000000
In your case, printing out a 2x2 for each ID is excessively verbose. I don't see an option to print a scalar correlation instead of the whole matrix, but you can do something like:
In [442]:df.groupby('ID')[['Val1','Val2']].corr().ix[0::2,'Val2']
Out[442]:
ID
A Val1 0.500000
B Val1 0.385727
And then rename and store things as you like.
这篇关于 pandas 关联小组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!