如何使用 python (Pandas) 生成堆叠条块 [英] How to have clusters of stacked bars with python (Pandas)

查看:28
本文介绍了如何使用 python (Pandas) 生成堆叠条块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集如下所示:

In [1]: df1=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [2]: df2=pd.DataFrame(np.random.rand(4,2),index=["A","B","C","D"],columns=["I","J"])

In [3]: df1
Out[3]: 
          I         J
A  0.675616  0.177597
B  0.675693  0.598682
C  0.631376  0.598966
D  0.229858  0.378817

In [4]: df2
Out[4]: 
          I         J
A  0.939620  0.984616
B  0.314818  0.456252
C  0.630907  0.656341
D  0.020994  0.538303

我想为每个数据框绘制堆积条形图,但由于它们具有相同的索引,我希望每个索引有 2 个堆积条形图.

我试图在相同的轴上绘制两个图:

I've tried to plot both on the same axes :

In [5]: ax = df1.plot(kind="bar", stacked=True)

In [5]: ax2 = df2.plot(kind="bar", stacked=True, ax = ax)

但它重叠.

然后我尝试先连接两个数据集:

Then I tried to concat the two dataset first :

pd.concat(dict(df1 = df1, df2 = df2),axis = 1).plot(kind="bar", stacked=True)

但这里一切都堆积起来

我最好的尝试是:

 pd.concat(dict(df1 = df1, df2 = df2),axis = 0).plot(kind="bar", stacked=True)

给出:

这基本上就是我想要的,只是我希望酒吧订购为

This is basically what I want, except that I want the bar ordered as

(df1,A) (df2,A) (df1,B) (df2,B) 等等...

(df1,A) (df2,A) (df1,B) (df2,B) etc...

我想有一个技巧,但我找不到!

I guess there is a trick but I can't found it !

在@bgschiller 的回答之后,我得到了这个:

After @bgschiller's answer I got this :

这几乎是我想要的.我希望该栏按索引聚集,以便在视觉上清晰可见.

Which is almost what I want. I would like the bar to be clustered by index, in order to have something visually clear.

奖励:x 标签不是多余的,例如:

Bonus : Having the x-label not redundant, something like :

df1 df2    df1 df2
_______    _______ ...
   A          B

感谢您的帮助.

推荐答案

我最终找到了一个技巧(见下文使用 seaborn 和 longform 数据框):

I eventually found a trick (edit: see below for using seaborn and longform dataframe):

这是一个更完整的例子:

Here it is with a more complete example :

import pandas as pd
import matplotlib.cm as cm
import numpy as np
import matplotlib.pyplot as plt

def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot",  H="/", **kwargs):
    """Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot. 
labels is a list of the names of the dataframe, used for the legend
title is a string for the title of the plot
H is the hatch used for identification of the different dataframe"""

    n_df = len(dfall)
    n_col = len(dfall[0].columns) 
    n_ind = len(dfall[0].index)
    axe = plt.subplot(111)

    for df in dfall : # for each data frame
        axe = df.plot(kind="bar",
                      linewidth=0,
                      stacked=True,
                      ax=axe,
                      legend=False,
                      grid=False,
                      **kwargs)  # make bar plots

    h,l = axe.get_legend_handles_labels() # get the handles we want to modify
    for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
        for j, pa in enumerate(h[i:i+n_col]):
            for rect in pa.patches: # for each index
                rect.set_x(rect.get_x() + 1 / float(n_df + 1) * i / float(n_col))
                rect.set_hatch(H * int(i / n_col)) #edited part     
                rect.set_width(1 / float(n_df + 1))

    axe.set_xticks((np.arange(0, 2 * n_ind, 2) + 1 / float(n_df + 1)) / 2.)
    axe.set_xticklabels(df.index, rotation = 0)
    axe.set_title(title)

    # Add invisible data to add another legend
    n=[]        
    for i in range(n_df):
        n.append(axe.bar(0, 0, color="gray", hatch=H * i))

    l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
    if labels is not None:
        l2 = plt.legend(n, labels, loc=[1.01, 0.1]) 
    axe.add_artist(l1)
    return axe

# create fake dataframes
df1 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df2 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"],
                   columns=["I", "J", "K", "L", "M"])
df3 = pd.DataFrame(np.random.rand(4, 5),
                   index=["A", "B", "C", "D"], 
                   columns=["I", "J", "K", "L", "M"])

# Then, just call :
plot_clustered_stacked([df1, df2, df3],["df1", "df2", "df3"])
    

它给出了:

您可以通过传递一个 cmap 参数来更改栏的颜色:

You can change the colors of the bar by passing a cmap argument:

plot_clustered_stacked([df1, df2, df3],
                       ["df1", "df2", "df3"],
                       cmap=plt.cm.viridis)


seaborn 的解决方案:

鉴于下面相同的 df1、df2、df3,我将它们转换为长格式:


Solution with seaborn:

Given the same df1, df2, df3, below, I convert them in a long form:

df1["Name"] = "df1"
df2["Name"] = "df2"
df3["Name"] = "df3"
dfall = pd.concat([pd.melt(i.reset_index(),
                           id_vars=["Name", "index"]) # transform in tidy format each df
                   for i in [df1, df2, df3]],
                   ignore_index=True)

seaborn 的问题在于它本身不会堆叠条形,所以诀窍是将每个条形的累积总和绘制在彼此的顶部:

The problem with seaborn is that it doesn't stack bars natively, so the trick is to plot the cumulative sum of each bar on top of each other:

dfall.set_index(["Name", "index", "variable"], inplace=1)
dfall["vcs"] = dfall.groupby(level=["Name", "index"]).cumsum()
dfall.reset_index(inplace=True) 

>>> dfall.head(6)
  Name index variable     value       vcs
0  df1     A        I  0.717286  0.717286
1  df1     B        I  0.236867  0.236867
2  df1     C        I  0.952557  0.952557
3  df1     D        I  0.487995  0.487995
4  df1     A        J  0.174489  0.891775
5  df1     B        J  0.332001  0.568868

然后遍历每组variable并绘制累积和:

Then loop over each group of variable and plot the cumulative sum:

c = ["blue", "purple", "red", "green", "pink"]
for i, g in enumerate(dfall.groupby("variable")):
    ax = sns.barplot(data=g[1],
                     x="index",
                     y="vcs",
                     hue="Name",
                     color=c[i],
                     zorder=-i, # so first bars stay on top
                     edgecolor="k")
ax.legend_.remove() # remove the redundant legends 

它缺少我认为可以轻松添加的图例.问题是,不是用阴影(可以很容易地添加)来区分数据帧,我们有一个亮度梯度,第一个它有点太轻了,我真的不知道如何在不改变每个的情况下改变它一个接一个的矩形(如第一个解决方案).

It lacks the legend that can be added easily I think. The problem is that instead of hatches (which can be added easily) to differentiate the dataframes we have a gradient of lightness, and it's a bit too light for the first one, and I don't really know how to change that without changing each rectangle one by one (as in the first solution).

如果您不理解代码中的某些内容,请告诉我.

Tell me if you don't understand something in the code.

请随意重复使用 CC0 下的此代码.

Feel free to re-use this code which is under CC0.

这篇关于如何使用 python (Pandas) 生成堆叠条块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆