如何使用python(Pandas)形成堆积条形图簇 [英] How to have clusters of stacked bars with python (Pandas)

查看:101
本文介绍了如何使用python(Pandas)形成堆积条形图簇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这就是我的数据集的样子:

So here is how my data set looks like :

In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [3]: df1
Out[3]: 
          I         J
A  0.675616  0.177597
B  0.675693  0.598682
C  0.631376  0.598966
D  0.229858  0.378817

In [4]: df2
Out[4]: 
          I         J
A  0.939620  0.984616
B  0.314818  0.456252
C  0.630907  0.656341
D  0.020994  0.538303

我想为每个数据框都具有堆积条形图,但是由于它们具有相同的索引,我希望每个索引具有2个堆积条形图.

我试图在相同的轴上进行绘制:

I've tried to plot both on the same axes :

In [5]: ax = df1.plot(kind="bar", stacked=True)

In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax)

但是它重叠了.

然后我尝试先合并两个数据集:

Then I tried to concat the two dataset first :

pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True)

但是这里一切都堆积了

我最好的尝试是:

 pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True)

哪个给:

这基本上是我想要的,除了我希望将酒吧订购为

This is basically what I want, except that I want the bar ordered as

(df1,A)(df2,A)(df1,B)(df2,B)等...

(df1,A) (df2,A) (df1,B) (df2,B) etc...

我猜有一个把戏,但是我找不到!

I guess there is a trick but I can't found it !

在@bgschiller回答之后,我得到了:

After @bgschiller's answer I got this :

几乎是我想要的.我希望该条按索引聚类,以便使内容视觉上清晰.

Which is almost what I want. I would like the bar to be clustered by index, in order to have something visually clear.

奖金:x标签不是多余的,类似于:

Bonus : Having the x-label not redundant, something like :

df1 df2    df1 df2
_______    _______ ...
   A          B

感谢您的帮助.

推荐答案

所以,我最终找到了一个窍门(有关使用Seaborn和longform数据帧的信息,请参见下文)

So, I eventually found a trick (edit: see below for using seaborn and longform dataframe):

这里有一个更完整的示例:

Here it is with a more complete example :

import pandas as pd
import matplotlib.cm as cm
import numpy as np
import matplotlib.pyplot as plt

def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):
    """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot. 
labels is a list of the names of the dataframe, used for the legend
title is a string for the title of the plot
H is the hatch used for identification of the different dataframe"""

    n_df = len(dfall)
    n_col = len(dfall[0].columns) 
    n_ind = len(dfall[0].index)
    axe = plt.subplot(111)

    for df in dfall : # for each data frame
        axe = df.plot(kind="bar",
                      linewidth=0,
                      stacked=True,
                      ax=axe,
                      legend=False,
                      grid=False,
                      **kwargs)  # make bar plots

    h,l = axe.get_legend_handles_labels() # get the handles we want to modify
    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
        for j, pa in enumerate(h[i:i+n_col]):
            for rect in pa.patches: # for each index
                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))
                rect.set_hatch(H * int(i / n_col)) #edited part     
                rect.set_width(1 / float(n_df + 1))

    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)
    axe.set_xticklabels(df.index, rotation = 0)
    axe.set_title(title)

    # Add invisible data to add another legend
    n=[]        
    for i in range(n_df):
        n.append(axe.bar(0, 0, color="gray", hatch=H * i))

    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
    if labels is not None:
        l2 = plt.legend(n, labels, loc=[1.01, 0.1]) 
    axe.add_artist(l1)
    return axe

# create fake dataframes
df1 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df2 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df3 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"], 
                   columns=["I", "J", "K", "L", "M"])

# Then, just call :
plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])

它给出了:

您可以通过传递cmap参数来更改条形的颜色:

You can change the colors of the bar by passing a cmap argument:

plot_clustered_stacked([df1, df2, df3],
                       ["df1", "df2", "df3"],
                       cmap=plt.cm.viridis)


Seaborn解决方案:

鉴于以下相同的df1,df2,df3,我将它们转换为长格式:


Solution with seaborn:

Given the same df1, df2, df3, below, I convert them in a long form:

df1["Name"] = "df1"
df2["Name"] = "df2"
df3["Name"] = "df3"
dfall = pd.concat([pd.melt(i.reset_index(),
                           id_vars=["Name", "index"]) # transform in tidy format each df
                   for i in [df1, df2, df3]],
                   ignore_index=True)

seaborn的问题是它不能自然地堆积条形图,因此诀窍是将每个条形图的累积总和绘制在彼此的顶部:

The problem with seaborn is that it doesn't stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other:

dfall.set_index(["Name", "index", "variable"], inplace=1)
dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum()
dfall.reset_index(inplace=True) 

>>> dfall.head(6)
  Name index variable     value       vcs
0  df1     A        I  0.717286  0.717286
1  df1     B        I  0.236867  0.236867
2  df1     C        I  0.952557  0.952557
3  df1     D        I  0.487995  0.487995
4  df1     A        J  0.174489  0.891775
5  df1     B        J  0.332001  0.568868

然后循环遍历每组variable并绘制累积和:

Then loop over each group of variable and plot the cumulative sum:

c = ["blue", "purple", "red", "green", "pink"]
for i, g in enumerate(dfall.groupby("variable")):
    ax = sns.barplot(data=g[1],
                     x="index",
                     y="vcs",
                     hue="Name",
                     color=c[i],
                     zorder=-i, # so first bars stay on top
                     edgecolor="k")
ax.legend_.remove() # remove the redundant legends 

它缺少可以轻松添加的图例.问题在于,我们没有阴影(可以轻松添加)来区分数据帧,而是具有亮度梯度,并且对于第一个而言,它有点太轻了,我真的不知道如何在不更改每个阴影的情况下进行更改矩形一个接一个(如第一个解决方案).

It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it's a bit too light for the first one, and I don't really know how to change that without changing each rectangle one by one (as in the first solution).

如果您不了解代码中的内容,请告诉我.

Tell me if you don't understand something in the code.

随时可以重复使用CC0下的此代码.

Feel free to re-use this code which is under CC0.

这篇关于如何使用python(Pandas)形成堆积条形图簇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆