如何检查python中连续变量和分类变量之间的相关性? [英] how to check for correlation among continuous and categorical variables in python?

查看:1100
本文介绍了如何检查python中连续变量和分类变量之间的相关性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含分类变量(二进制)和连续变量的数据集.我正在尝试应用线性回归模型来预测连续变量.有人可以让我知道如何检查分类变量和连续目标变量之间的相关性.

I have a dataset including categorical variables(binary) and continuous variables. I'm trying to apply a linear regression model for predicting a continuous variable. Can someone please let me know how to check for correlation among the categorical variables and the continuous target variable.

当前代码:

import pandas as pd
df_hosp = pd.read_csv('C:\Users\LAPPY-2\Desktop\LengthOfStay.csv')

data = df_hosp[['lengthofstay', 'male', 'female', 'dialysisrenalendstage', 'asthma', \
              'irondef', 'pneum', 'substancedependence', \
              'psychologicaldisordermajor', 'depress', 'psychother', \
              'fibrosisandother', 'malnutrition', 'hemo']]
print data.corr()

除lengthofstay以外的所有变量都是分类的.应该可以吗?

All of the variables apart from lengthofstay are categorical. Should this work?

推荐答案

将类别变量转换为虚拟变量

Convert your categorical variable into dummy variables here and put your variable in numpy.array. For example:

data.csv :

age,size,color_head
4,50,black
9,100,blonde
12,120,brown
17,160,black
18,180,brown

提取数据:

import numpy as np
import pandas as pd

df = pd.read_csv('data.csv')

df:

将分类变量color_head转换为伪变量:

Convert categorical variable color_head into dummy variables:

df_dummies = pd.get_dummies(df['color_head'])
del df_dummies[df_dummies.columns[-1]]
df_new = pd.concat([df, df_dummies], axis=1)
del df_new['color_head']

df_new:

将其放入numpy数组:

Put that in numpy array:

x = df_new.values

计算相关性:

correlation_matrix = np.corrcoef(x.T)
print(correlation_matrix)

输出:

array([[ 1.        ,  0.99574691, -0.23658011, -0.28975028],
       [ 0.99574691,  1.        , -0.30318496, -0.24026862],
       [-0.23658011, -0.30318496,  1.        , -0.40824829],
       [-0.28975028, -0.24026862, -0.40824829,  1.        ]])

请参阅:

numpy.corrcoef

这篇关于如何检查python中连续变量和分类变量之间的相关性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆