Python DataFrame-用分组列(至少两列)为数据框绘制条形图 [英] Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

查看:121
本文介绍了Python DataFrame-用分组列(至少两列)为数据框绘制条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力使用matlplotlib在python中重新创建此Excel图形:

数据在数据框中;我正在尝试自动化生成该图的过程.

我尝试拆开数据框,进行子图绘制,但是我没有设法创建"Zone"区域.索引在Excel中是如此优雅.我已经成功地在没有此区域"的情况下绘制了图表.索引,但这并不是我真正想做的.

这是我的代码:

  data = pd.DataFrame({工厂区":["AMERICAS","APAC","APAC","APAC","APAC","APAC","EMEA","EMEA","EMEA","EMEA"],'工厂名称':[巧克力工厂",蜡笔工厂","Jobs Ur Us","Gibberish US","Lil Grey","Toys R Us","Food Inc.",宠物店",邦邦工厂",家乐福"],生产第1天":[24,1,9,29,92,79,4,90,42,35],生产第2天":[2,43,17,5,31,89,44,49,34,84]})df = pd.DataFrame(data)打印(df)#没有FactoryZone,它可以工作:df = df.drop(['工厂区域'],轴= 1)图片= df.plot(种类=栏") 

数据如下:

 未命名:0 FactoryZone工厂名称生产第1天生产第2天0 1美洲巧克力工厂24 431 2美国蜡笔厂1 172 3 EMEA宠物店9 53 4 EMEA糖果工厂29 314 5亚太地区Lil Gray 92 895 6美国乔布斯我们79 446 7亚太地区玩具反斗城4 497 8 EMEA家乐福90 348 9美国胡言乱语美国42 849 10亚太食品公司35 62 

解决方案

您可以通过首先创建



参考文献:Quang Hoang的答案, gyx-hh的答案

I've been struggling to recreate this Excel graph in python using matlplotlib:

The data is in a dataframe; I'm trying to automate the process of generating this graph.

I've tried unstacking my dataframe, subplotting, but I haven't managed to create the "Zone" index which is so elegant in Excel. I have successfully managed to plot the graph without this "Zone" index, but that's not really what I want to do.

Here is my code:

data = pd.DataFrame(
    {
        'Factory Zone':
        ["AMERICAS","APAC","APAC","APAC","APAC","APAC","EMEA","EMEA","EMEA","EMEA"],
        'Factory Name':
        ["Chocolate Factory","Crayon Factory","Jobs Ur Us", "Gibberish US","Lil Grey", "Toys R Us","Food Inc.",
        "Pet Shop", "Bonbon Factory","Carrefour"],
        'Production Day 1':
        [24,1,9,29,92,79,4,90,42,35],
        'Production Day 2':
        [2,43,17,5,31,89,44,49,34,84]
    })
df = pd.DataFrame(data)
print(df)
# Without FactoryZone, it works:
df = df.drop(['Factory Zone'], axis=1)
image = df.plot(kind="bar")

And the data looks like this:

  Unnamed: 0 FactoryZone       Factory Name  Production Day 1  Production Day 2
0           1    AMERICAS  Chocolate Factory                24                43
1           2    AMERICAS     Crayon Factory                 1                17
2           3        EMEA           Pet Shop                 9                 5
3           4        EMEA     Bonbon Factory                29                31
4           5        APAC           Lil Grey                92                89
5           6    AMERICAS         Jobs Ur Us                79                44
6           7        APAC          Toys R Us                 4                49
7           8        EMEA          Carrefour                90                34
8           9    AMERICAS       Gibberish US                42                84
9          10        APAC          Food Inc.                35                62

解决方案

You can create this plot by first creating a MultiIndex for your hierarchical dataset where level 0 is the Factory Zone and level 1 is the Factory Name:

import numpy as np                 # v 1.19.2
import pandas as pd                # v 1.1.3
import matplotlib.pyplot as plt    # v 3.3.2

df = pd.DataFrame(
    {'Factory Zone': ['AMERICAS', 'AMERICAS', 'AMERICAS', 'AMERICAS', 'APAC',
                      'APAC', 'APAC', 'EMEA', 'EMEA', 'EMEA'],
     'Factory Name': ['Chocolate Factory', 'Crayon Factory', 'Jobs Ur Us',
                      'Gibberish US', 'Lil Grey', 'Toys R Us', 'Food Inc.',
                      'Pet Shop', 'Bonbon Factory','Carrefour'],
     'Production Day 1': [24,1,9,29,92,79,4,90,42,35],
     'Production Day 2': [2,43,17,5,31,89,44,49,34,84]
    })

df.set_index(['Factory Zone', 'Factory Name'], inplace=True)

df

#                                   Production Day 1  Production Day 2
#  Factory Zone       Factory Name      
#      AMERICAS  Chocolate Factory                24                 2
#                   Crayon Factory                 1                43
#                       Jobs Ur Us                 9                17
#                     Gibberish US                29                 5
#          APAC           Lil Grey                92                31
#                        Toys R Us                79                89
#                        Food Inc.                 4                44
#         EMEA            Pet Shop                90                49
#                   Bonbon Factory                42                34
#                        Carrefour                35                84

Like Quang Hoang has proposed, you can create a subplot for each zone and stick them together. The width of each subplot must be corrected according to the number of factories by using the width_ratios argument in the gridspec_kw dictionary so that all the columns have the same width. Then there are limitless formatting choices to make.

In the following example, I choose to show separation lines only between zones by using the minor tick marks for this purpose. Also, because the figure width is limited here to 10 inches only, I rewrite the longer labels on two lines.

# Create figure with a subplot for each factory zone with a relative width
# proportionate to the number of factories
zones = df.index.levels[0]
nplots = zones.size
plots_width_ratios = [df.xs(zone).index.size for zone in zones]
fig, axes = plt.subplots(nrows=1, ncols=nplots, sharey=True, figsize=(10, 4),
                         gridspec_kw = dict(width_ratios=plots_width_ratios, wspace=0))

# Loop through array of axes to create grouped bar chart for each factory zone
alpha = 0.3 # used for grid lines, bottom spine and separation lines between zones
for zone, ax in zip(zones, axes):
    # Create bar chart with grid lines and no spines except bottom one
    df.xs(zone).plot.bar(ax=ax, legend=None, zorder=2)
    ax.grid(axis='y', zorder=1, color='black', alpha=alpha)
    for spine in ['top', 'left', 'right']:
        ax.spines[spine].set_visible(False)
    ax.spines['bottom'].set_alpha(alpha)
    
    # Set and place x labels for factory zones
    ax.set_xlabel(zone)
    ax.xaxis.set_label_coords(x=0.5, y=-0.2)
    
    # Format major tick labels for factory names: note that because this figure is
    # only about 10 inches wide, I choose to rewrite the long names on two lines.
    ticklabels = [name.replace(' ', '\n') if len(name) > 10 else name
                  for name in df.xs(zone).index]
    ax.set_xticklabels(ticklabels, rotation=0, ha='center')
    ax.tick_params(axis='both', length=0, pad=7)
    
    # Set and format minor tick marks for separation lines between zones: note
    # that except for the first subplot, only the right tick mark is drawn to avoid
    # duplicate overlapping lines so that when an alpha different from 1 is chosen
    # (like in this example) all the lines look the same
    if ax.is_first_col():
        ax.set_xticks([*ax.get_xlim()], minor=True)
    else:
        ax.set_xticks([ax.get_xlim()[1]], minor=True)
    ax.tick_params(which='minor', length=55, width=0.8, color=[0, 0, 0, alpha])

# Add legend using the labels and handles from the last subplot
fig.legend(*ax.get_legend_handles_labels(), frameon=False, loc=(0.08, 0.77))

fig.suptitle('Production Quantity by Zone and Factory on both days', y=1.02, size=14);



References: the answer by Quang Hoang, this answer by gyx-hh

这篇关于Python DataFrame-用分组列(至少两列)为数据框绘制条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆