Seaborn:使用不对称的自定义误差线按组制作条形图 [英] Seaborn: Making barplot by group with asymmetrical custom error bars
问题描述
我有一个Pandas数据框,其中有几个类似下面的组列.
gr1 grp2变量lb m ubA A1 V1 1.00 1.50 2.5A A2 V2 1.50 2.50 3.5B A1 V1 3.50 14.50 30.5B A2 V2 0.25 0.75 1.0
我正在尝试使用 FacetGrid
为 variables
中的每个变量获取一个单独的子栏.我正在尝试构建最终所需的图,如下图所示.
这是我到目前为止所拥有的.
g = sns.FacetGrid(df,col ="variables",hue ="grp1")g.map(sns.barplot,'grp2','m',order = times)
但是不幸的是,这正在堆积我所有的数据点.
我应该如何使用 Seaborn
来做到这一点?
更新:以下代码在很大程度上满足了我的要求,但目前不显示 yerr
.
g = sns.factorplot(x ="Grp2",y ="m",hue ="Grp1",col ="variables",data = df,kind ="bar",size = 4,Aspect = .7,sharey = False)
我如何将 lb
和 ub
合并为因子图上的误差线?
在我们开始之前,我先说一下matplotlib要求错误是相对于数据而不是绝对边界的.因此,我们将通过减去相应的列来修改数据框以解决这一问题.
u = u""grp1 grp2变量lb m ubA A1 V1 1.00 1.50 2.5A A2 V2 1.50 2.50 3.5B A1 V1 7.50 14.50 20.5B A2 V2 0.25 0.75 1.0A A2 V1 1.00 6.50 8.5A A1 V2 1.50 3.50 6.5B A2 V1 3.50 4.50 15.5B A1 V2 8.25 12.75 13.9"导入io将熊猫作为pd导入df = pd.read_csv(io.StringIO(u),delim_whitespace = True)#错误必须与数据有关(不是绝对界限)df ["lb"] = df ["m"] -df ["lb"]df ["ub"] = df ["ub"] -df ["m"]
现在有两种解决方案,它们基本上是相同的.让我们从不使用seaborn的解决方案开始,而是使用熊猫绘图包装程序(原因稍后会变得清楚).
不使用Seaborn
Pandas允许通过使用数据框来绘制分组的条形图,其中每一列都属于或构成一个组.因此,要采取的步骤是
- 根据不同的
变量
的数量创建多个子图. 按日期 -
groupby
按变量
- 为每个组创建一个透视数据框,其中将
grp1
的值作为列,将m
的值作为值.对两个错误列执行相同的操作. - 从
使用Seaborn
Seaborn factorplot不允许使用自定义错误栏.因此,需要使用
FaceGrid
方法.为了不堆积这些条形图,可以将hue
参数放在map
调用中.因此,以下内容等同于问题中的sns.factorplot
调用.g = sns.FacetGrid(data = df,col ="variables",size = 4,Aspect = .7)g.map(sns.barplot,"grp2","m","grp1",order = ["A1","A2"])
现在的问题是,我们无法从外部将错误条导入barplot,或更重要的是,我们无法将分组条形图的错误提供给
seaborn.barplot
.对于未分组的barplot,可以通过yerr
参数提供错误,该参数将传递到matplotlibplt.bar
图上.此概念显示在I have a Pandas dataframe that has a couple of group columns like below.
gr1 grp2 variables lb m ub A A1 V1 1.00 1.50 2.5 A A2 V2 1.50 2.50 3.5 B A1 V1 3.50 14.50 30.5 B A2 V2 0.25 0.75 1.0
I am trying to get a separate sub-barplot for each variable in
variables
usingFacetGrid
. I am trying to build to the final plot that I need which looks like the below.This is what I have so far.
g = sns.FacetGrid(df, col="variables", hue="grp1") g.map(sns.barplot, 'grp2', 'm', order=times)
But unfortunately this is stacking all my datapoints.
How should I go about doing this with
Seaborn
?UPDATE: The following code largely does what I'm after but currently does not display
yerr
.g = sns.factorplot(x="Grp2", y="m", hue="Grp1", col="variables", data=df, kind="bar", size=4, aspect=.7, sharey=False)
How can I incorporate the
lb
andub
as error bars on the factorplot?解决方案Before we start let me mention that matplotlib requires the errors to be relative to the data, not absolute boundaries. We would hence modify the dataframe to account for that by subtracting the respective columns.
u = u"""grp1 grp2 variables lb m ub A A1 V1 1.00 1.50 2.5 A A2 V2 1.50 2.50 3.5 B A1 V1 7.50 14.50 20.5 B A2 V2 0.25 0.75 1.0 A A2 V1 1.00 6.50 8.5 A A1 V2 1.50 3.50 6.5 B A2 V1 3.50 4.50 15.5 B A1 V2 8.25 12.75 13.9""" import io import pandas as pd df = pd.read_csv(io.StringIO(u), delim_whitespace=True) # errors must be relative to data (not absolute bounds) df["lb"] = df["m"]-df["lb"] df["ub"] = df["ub"]-df["m"]
Now there are two solutions, which are essentially the same. Let's start with a solution which does not use seaborn, but the pandas plotting wrapper (the reason will become clear later).
Not using Seaborn
Pandas allows to plot grouped barplots by using dataframes where each column belongs to or constitutes one group. The steps to take are therefore
- create a number of subplots according to the number of different
variables
. groupby
the dateframe byvariables
- for each group, create a pivoted dataframe, which has the values of
grp1
as columns and them
as values. Do the same for the two error columns. - Apply the solution from How add asymmetric errorbars to Pandas grouped barplot?
The code would then look like:
import io import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(io.StringIO(u), delim_whitespace=True) # errors must be relative to data (not absolute bounds) df["lb"] = df["m"]-df["lb"] df["ub"] = df["ub"]-df["m"] def func(x,y,h,lb,ub, **kwargs): data = kwargs.pop("data") # from https://stackoverflow.com/a/37139647/4124317 errLo = data.pivot(index=x, columns=h, values=lb) errHi = data.pivot(index=x, columns=h, values=ub) err = [] for col in errLo: err.append([errLo[col].values, errHi[col].values]) err = np.abs(err) p = data.pivot(index=x, columns=h, values=y) p.plot(kind='bar',yerr=err,ax=plt.gca(), **kwargs) fig, axes = plt.subplots(ncols=len(df.variables.unique())) for ax, (name, group) in zip(axes,df.groupby("variables")): plt.sca(ax) func("grp2", "m", "grp1", "lb", "ub", data=group, color=["limegreen", "indigo"]) plt.title(name) plt.show()
using Seaborn
Seaborn factorplot does not allow for custom errorbars. One would therefore need to use the
FaceGrid
approach. In order not to have the bars stacked, one would put thehue
argument in themap
call. The following is thus the equivalent of thesns.factorplot
call from the question.g = sns.FacetGrid(data=df, col="variables", size=4, aspect=.7 ) g.map(sns.barplot, "grp2", "m", "grp1", order=["A1","A2"] )
Now the problem is, we cannot get the errorbars into the barplot from the outside or more importantly, we cannot give the errors for a grouped barchart to
seaborn.barplot
. For a non grouped barplot one would be able to supply the error via theyerr
argument, which is passed onto the matplotlibplt.bar
plot. This concept is shown in this question. However, sinceseaborn.barplot
callsplt.bar
several times, once for eachhue
, the errors in each call would be the same (or their dimension wouldn't match).The only option I see is hence to use a
FacetGrid
and map exactly the same function as used above to it. This somehow renders the use of seaborn obsolete, but for completeness, here is theFacetGrid
solution.import io import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt df = pd.read_csv(io.StringIO(u), delim_whitespace=True) # errors must be relative to data (not absolute bounds) df["lb"] = df["m"]-df["lb"] df["ub"] = df["ub"]-df["m"] def func(x,y,h,lb,ub, **kwargs): data = kwargs.pop("data") # from https://stackoverflow.com/a/37139647/4124317 errLo = data.pivot(index=x, columns=h, values=lb) errHi = data.pivot(index=x, columns=h, values=ub) err = [] for col in errLo: err.append([errLo[col].values, errHi[col].values]) err = np.abs(err) p = data.pivot(index=x, columns=h, values=y) p.plot(kind='bar',yerr=err,ax=plt.gca(), **kwargs) g = sns.FacetGrid(df, col="variables", size=4, aspect=.7, ) g.map_dataframe(func, "grp2", "m", "grp1", "lb", "ub" , color=["limegreen", "indigo"]) g.add_legend() plt.show()
这篇关于Seaborn:使用不对称的自定义误差线按组制作条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- create a number of subplots according to the number of different