循环中的堆积条形图,不添加条形的不同成分 [英] stacked bar plot from loop not adding different components of bars

查看:63
本文介绍了循环中的堆积条形图,不添加条形的不同成分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据集(列号和行号可以变化,这就是为什么我需要定义一个绘图函数的原因).

I have a dataset like this (column and row number can vary, which is why I need to define a function for plotting).

import pandas as pd
import numpy as np
plot_df = pd.DataFrame({
  'decl': [0.000000, 0.000000, 0.000000, 0.000667, 0.000833, 0.000833, 0.000000],
  'dk': [0.003333, 0.000000, 0.000000, 0.001333, 0.001667, 0.000000, 0.000000],
  'yes': [0.769167, 0.843333, 0.762000, 0.666000, 0.721667, 0.721667, 0.775833],
  'no': [0.227500, 0.156667, 0.238000, 0.332000, 0.275833, 0.277500, 0.224167]})

对于此数据,我想创建一个类似于用此代码为静态数字创建的绘图:

For this data, I would like to create a plot akin to the one created with this code for a static number:

# configure plot
N = len(plot_df) # number of groups
num_y_cats = len(plot_df.columns) # number of y-categories (responses)
ind = np.arange(N) # x locations for the groups
width = 0.35 # width of bars

p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width)
p3 = plt.bar(ind, plot_df.iloc[:,2], width)
p4 = plt.bar(ind, plot_df.iloc[:,3], width)

plt.ylabel('[%]')
plt.title('Responses by country')

x_ticks_names = tuple([item for item in plot_df.index])

plt.xticks(ind, x_ticks_names)
plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
plt.legend((p1[0], p2[0], p3[0], p4[0]), ('decl', 'dk', 'yes', 'no'))
plt.show()

这给了我以下情节,我无法克服并寻求帮助的两个问题:

This gives me the following plot, which poses two issues I cannot overcome and seek help for:

  1. 这些数字的总和不等于1.0-尽管它们应该相加,因为我用规范化(plot_df['sum'] = plot_df['decl'] + plot_df['dk'] + plot_df['yes'] + plot_df['no'])创建了原始的df.
  2. 另一个问题是我想定义一个函数,该函数为具有可变行数和列数的df创建相同的图,但是卡在创建不同图的部分上.到目前为止,我有:

  1. The numbers don't add up to 1.0 - although they should, as I created the original df with a normalization (plot_df['sum'] = plot_df['decl'] + plot_df['dk'] + plot_df['yes'] + plot_df['no']).
  2. The other issue is that I want to define a function creating the same plot for dfs with a variable number of rows and columns but am stuck on the part creating the different plots. Thus far, I have:

def bar_plot(plot_df):
''' input: data frame where rows are groups; columns are plot components to be stacked '''

# configure plot
N = len(plot_df) # number of groups
num_y_cats = len(plot_df.columns) # number of y-categories (responses)
ind = np.arange(N) # x locations for the groups
width = 0.35 # width of bars

for i in range(num_y_cats): # for every response in the number of responses, e.g. 'Yes', 'No' etc.
    p = plt.bar(ind, plot_df.iloc[:,i], width) # plot containing the response

plt.ylabel('[%]')
plt.title('Responses by group')

x_ticks_names = tuple([item for item in plot_df.index]) # create a tuple containing all [country] names

plt.xticks(ind, x_ticks_names)
plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
plt.show()   

但是,这里的问题是循环没有正确添加不同的图层,我无法弄清楚该怎么做. 有人可以给我指点吗?

However, the problem here is that the loop doesn't properly add the different layers, and I cannot figure out how to do it. Could someone give me pointer?

推荐答案

问题编号1(如果我正确理解的话)是条形的高度不为1(即所有分数的总和).您的代码

Problem number 1, if I understand you correctly is that the heigth of the bars is not 1 (i.e. the sum of all the fractions). Your code

p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width)
...

创建四个条形图,从 all 0 (在y轴上).我们想要的是p2p1顶部开始,p3p2顶部开始,依此类推.为此,我们可以在plt.bar中指定bottom参数(默认为0).所以,

creates four bar plots, all starting from 0 (on the y-axis). What we want is for p2 to start on top of p1, p3 to start on top of p2 and so on. To do this we can specify the bottom argument (which defaults to 0) in plt.bar. So,

p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width, bottom=plot_df.iloc[:,0])
...

对于p3,我们希望bottomplot_df.iloc[:,0]plot_df.iloc[:,1]之和开始.我们可以显式地执行此操作,也可以像np.sum(plot_df.iloc[:,:i]一样使用np.sum来执行此操作.后者当然具有我们可以对任意数量的列求和的优点(就像您希望在函数中使用的那样).

for p3 we want bottom to start on the sum of plot_df.iloc[:,0] and plot_df.iloc[:,1]. We can do this either explicitly or using np.sum like so np.sum(plot_df.iloc[:,:i]. The latter of course has the advantage that we can sum over an arbitrary number of columns (like you want in your function).

关于您的功能...我试了一下.您可能必须自己完善它

As for your function... I gave it a shot. You probably have to perfect it yourself

def bar_plot(plot_df):
    width = 0.35 # width of bars

    p_s = []
    p_s.append(plt.bar(ind, plot_df.iloc[:,0], width))
    for i in range(1,len(plot_df.columns)):
        p_s.append(plt.bar(ind, plot_df.iloc[:,i], width,
                           bottom=np.sum(plot_df.iloc[:,:i], axis=1)))

    plt.ylabel('[%]')
    plt.title('Responses by country')

    x_ticks_names = tuple([item for item in plot_df.index])

    plt.xticks(ind, x_ticks_names)
    plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
    plt.legend(p_s, plot_df.columns)
    plt.show()

这篇关于循环中的堆积条形图,不添加条形的不同成分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆