pandas DataFrame的多列的并排箱线图 [英] Side-by-side boxplot of multiple columns of a pandas DataFrame

查看:282
本文介绍了 pandas DataFrame的多列的并排箱线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一年的样本数据:

 将pandas导入为pd导入numpy.random作为rnd将seaborn导入为snsn = 365df = pd.DataFrame(data = {"A":rnd.randn(n),"B":rnd.randn(n)+1},index = pd.date_range(start ="2017-01-01",period = n,freq ="D")) 

我想对按月分组的这些数据进行箱线绘图(即每月两个框,一个用于 A ,一个用于 B )./p>

对于单列, sns.boxplot(df.index.month,df ["A"])可以正常工作.但是, sns.boxplot(df.index.month,df [["A","B"]])会引发错误( ValueError:无法将大小为2的序列复制到数组轴尺寸为365 ).为了使用seaborn的数据,按索引( pd.melt(df,id_vars = df.index,value_vars = ["A","B"],var_name ="column")熔化数据 hue 属性不能解决( TypeError:不可散列的类型:'DatetimeIndex').

(如果使用普通的matplotlib更容易,则解决方案不一定需要使用seaborn.)

编辑

我找到了一种解决方法,基本上可以产生我想要的东西.但是,一旦DataFrame包含比我要绘制的变量更多的变量,使用它就会变得有些尴尬.因此,如果有更优雅/直接的方法,请分享!

  df_stacked = df.stack().reset_index()df_stacked.columns = ["date","vars","vals"]df_stacked.index = df_stacked ["date"]sns.boxplot(x = df_stacked.index.month,y ="vals",hue ="vars",data = df_stacked) 

产生:

解决方案

这是使用熊猫融化和海生的解决方案:

 将pandas导入为pd导入numpy.random作为rnd将seaborn导入为snsn = 365df = pd.DataFrame(data = {"A":rnd.randn(n),"B":rnd.randn(n)+1,"C":rnd.randn(n)+ 10,#将不会绘制},index = pd.date_range(start ="2017-01-01",period = n,freq ="D"))df ['month'] = df.index.monthdf_plot = df.melt(id_vars ='month',value_vars = ["A","B"])sns.boxplot(x ='month',y ='value',hue ='variable',data = df_plot) 

One year of sample data:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))

I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for A and one for B).

For a single column sns.boxplot(df.index.month, df["A"]) works fine. However, sns.boxplot(df.index.month, df[["A", "B"]]) throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")) in order to use seaborn's hue property as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex').

(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)

Edit

I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!

df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)

Produces:

解决方案

here's a solution using pandas melting and seaborn:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
                          "B": rnd.randn(n)+1,
                          "C": rnd.randn(n) + 10, # will not be plotted
                         },
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)

这篇关于 pandas DataFrame的多列的并排箱线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆