在Python中如何做多列之间的相关性超过2个变量? [英] In Python how to do Correlation between Multiple Columns more than 2 variables?
问题描述
我有一个这样的Pandas Dataframe:
I have a Pandas Dataframe like so:
id cat1 cat2 cat3 num1 num2
1 0 WN 29 2003 98
2 1 TX 12 755 76
3 0 WY 11 845 32
4 1 IL 19 935 46
我想找出 cat1
与列 cat3
, num1 之间的相关性code>和
num2
或 cat1
和 num1
和 num2
或在 cat2
和 cat1,cat3之间,num1,num2
I want to find out the correlation between cat1
and column cat3
, num1
and num2
or between cat1
and num1
and num2
or between cat2
and cat1, cat3, num1, num2
当我使用 df.corr()
时,它给出了所有列之间的相关性
When I use df.corr()
it gives Correlation between all the columns in the dataframe, but I want to see Correlation between just these selective columns detailed above.
我如何在Python熊猫中做到这一点?
How do I do that in Python pandas?
A
推荐答案
我尝试了以下操作,并且有效:
I tried the following and it worked :
features1=list(['cat1','cat2','cat3'])
features2=list(['Cat1', 'Cat2','num1','num2'])
df[features1].corr()
df[features2].corr()
当数据集中有大量变量时,根据需要选择列的好方法。
Good way to select the columns based on the need when you have a very high number of variables in your dataset.
这篇关于在Python中如何做多列之间的相关性超过2个变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!