pandas scatter_matrix-绘制分类变量 [英] Pandas scatter_matrix - plot categorical variables

查看:51
本文介绍了 pandas scatter_matrix-绘制分类变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看 Kaggle 比赛中著名的泰坦尼克号数据集:http://www.kaggle.com/c/titanic-gettingStarted/data

I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data

我已经使用以下方法加载并处理了数据:

I have loaded and processed the data using:

# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# load the data from the file
df = pd.read_csv('./data/train.csv')

# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix

# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']

# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))

df.info()

如何在情节中添加诸如性别"和着陆"之类的分类列?

How can I add the categorical columns like Sex and Embarked to the plot?

推荐答案

您需要将分类变量转换为数字以绘制它们.

You need to transform the categorical variables into numbers to plot them.

示例(假设性别"列保存性别数据,男性为"M",女性为"F")

Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1

现在所有的女性都用 0 &男性减少 1.未知性别(如果有)将被忽略.

Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

其余代码应能很好地处理更新的数据框.

The rest of your code should process the updated dataframe nicely.

这篇关于 pandas scatter_matrix-绘制分类变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆