Python 2.7和Pandas Boxplot连接中值 [英] Python 2.7 and Pandas Boxplot connecting median values

查看:184
本文介绍了Python 2.7和Pandas Boxplot连接中值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

绘制一条连接箱形图平均值的线似乎是一件容易的事,但我无法弄清楚如何在大熊猫中绘制此图。



我正在使用以下语法制作箱形图,以便它自动生成Y与X设备的箱形图,而无需对数据帧进行外部处理:

  df.boxplot(column ='Y_Data',by = Category,showfliers = True,showmeans = True)



我想到的一种方法是通过从箱线图中获取平均值来绘制线图,但是我不确定如何从图中提取该信息。

解决方案

您可以保存从 df.boxplot()返回的轴对象,并将平均值绘制为使用同一轴的线图。我建议使用Seaborn的



您的标题提到中位数,但您在帖子中谈论类别均值。我在这里使用了手段;如果要绘制中位数,则将 groupby 聚合更改为 median()


It seems like plotting a line connecting the mean values of box plots would be a simple thing to do, but I couldn't figure out how to do this plot in pandas.

I'm using this syntax to do the boxplot so that it automatically generate the box plot for Y vs. X device without having to do external manipulation of the data frame:

df.boxplot(column='Y_Data', by="Category", showfliers=True, showmeans=True)

One way I thought of doing is to just do a line plot by getting the mean values from the boxplot, but I'm not sure how to extract that information from the plot.

解决方案

You can save the axis object that gets returned from df.boxplot(), and plot the means as a line plot using that same axis. I'd suggest using Seaborn's pointplot for the lines, as it handles a categorical x-axis nicely.

First let's generate some sample data:

import pandas as pd
import numpy as np
import seaborn as sns

N = 150
values = np.random.random(size=N)
groups = np.random.choice(['A','B','C'], size=N)
df = pd.DataFrame({'value':values, 'group':groups})

print(df.head())
  group     value
0     A  0.816847
1     A  0.468465
2     C  0.871975
3     B  0.933708
4     A  0.480170
              ...

Next, make the boxplot and save the axis object:

ax = df.boxplot(column='value', by='group', showfliers=True, 
                positions=range(df.group.unique().shape[0]))

Note: There's a curious positions argument in Pyplot/Pandas boxplot(), which can cause off-by-one errors. See more in this discussion, including the workaround I've employed here.

Finally, use groupby to get category means, and then connect mean values with a line plot overlaid on top of the boxplot:

sns.pointplot(x='group', y='value', data=df.groupby('group', as_index=False).mean(), ax=ax)

Your title mentions "median" but you talk about category means in your post. I used means here; change the groupby aggregation to median() if you want to plot medians instead.

这篇关于Python 2.7和Pandas Boxplot连接中值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆