循环中的堆积条形图,不添加条形的不同成分 [英] stacked bar plot from loop not adding different components of bars
问题描述
我有一个像这样的数据集(列号和行号可以变化,这就是为什么我需要定义一个绘图函数的原因).
I have a dataset like this (column and row number can vary, which is why I need to define a function for plotting).
import pandas as pd
import numpy as np
plot_df = pd.DataFrame({
'decl': [0.000000, 0.000000, 0.000000, 0.000667, 0.000833, 0.000833, 0.000000],
'dk': [0.003333, 0.000000, 0.000000, 0.001333, 0.001667, 0.000000, 0.000000],
'yes': [0.769167, 0.843333, 0.762000, 0.666000, 0.721667, 0.721667, 0.775833],
'no': [0.227500, 0.156667, 0.238000, 0.332000, 0.275833, 0.277500, 0.224167]})
对于此数据,我想创建一个类似于用此代码为静态数字创建的绘图:
For this data, I would like to create a plot akin to the one created with this code for a static number:
# configure plot
N = len(plot_df) # number of groups
num_y_cats = len(plot_df.columns) # number of y-categories (responses)
ind = np.arange(N) # x locations for the groups
width = 0.35 # width of bars
p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width)
p3 = plt.bar(ind, plot_df.iloc[:,2], width)
p4 = plt.bar(ind, plot_df.iloc[:,3], width)
plt.ylabel('[%]')
plt.title('Responses by country')
x_ticks_names = tuple([item for item in plot_df.index])
plt.xticks(ind, x_ticks_names)
plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
plt.legend((p1[0], p2[0], p3[0], p4[0]), ('decl', 'dk', 'yes', 'no'))
plt.show()
这给了我以下情节,我无法克服并寻求帮助的两个问题:
This gives me the following plot, which poses two issues I cannot overcome and seek help for:
- 这些数字的总和不等于1.0-尽管它们应该相加,因为我用规范化(
plot_df['sum'] = plot_df['decl'] + plot_df['dk'] + plot_df['yes'] + plot_df['no']
)创建了原始的df
. -
另一个问题是我想定义一个函数,该函数为具有可变行数和列数的
df
创建相同的图,但是卡在创建不同图的部分上.到目前为止,我有:
- The numbers don't add up to 1.0 - although they should, as I created the original
df
with a normalization (plot_df['sum'] = plot_df['decl'] + plot_df['dk'] + plot_df['yes'] + plot_df['no']
). The other issue is that I want to define a function creating the same plot for
df
s with a variable number of rows and columns but am stuck on the part creating the different plots. Thus far, I have:
def bar_plot(plot_df):
''' input: data frame where rows are groups; columns are plot components to be stacked '''
# configure plot
N = len(plot_df) # number of groups
num_y_cats = len(plot_df.columns) # number of y-categories (responses)
ind = np.arange(N) # x locations for the groups
width = 0.35 # width of bars
for i in range(num_y_cats): # for every response in the number of responses, e.g. 'Yes', 'No' etc.
p = plt.bar(ind, plot_df.iloc[:,i], width) # plot containing the response
plt.ylabel('[%]')
plt.title('Responses by group')
x_ticks_names = tuple([item for item in plot_df.index]) # create a tuple containing all [country] names
plt.xticks(ind, x_ticks_names)
plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
plt.show()
但是,这里的问题是循环没有正确添加不同的图层,我无法弄清楚该怎么做. 有人可以给我指点吗?
However, the problem here is that the loop doesn't properly add the different layers, and I cannot figure out how to do it. Could someone give me pointer?
推荐答案
问题编号1(如果我正确理解的话)是条形的高度不为1(即所有分数的总和).您的代码
Problem number 1, if I understand you correctly is that the heigth of the bars is not 1 (i.e. the sum of all the fractions). Your code
p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width)
...
创建四个条形图,从 all 0 (在y轴上).我们想要的是p2
从p1
顶部开始,p3
从p2
顶部开始,依此类推.为此,我们可以在plt.bar
中指定bottom
参数(默认为0).所以,
creates four bar plots, all starting from 0 (on the y-axis). What we want is for p2
to start on top of p1
, p3
to start on top of p2
and so on. To do this we can specify the bottom
argument (which defaults to 0) in plt.bar
. So,
p1 = plt.bar(ind, plot_df.iloc[:,0], width)
p2 = plt.bar(ind, plot_df.iloc[:,1], width, bottom=plot_df.iloc[:,0])
...
对于p3
,我们希望bottom
从plot_df.iloc[:,0]
和plot_df.iloc[:,1]
之和开始.我们可以显式地执行此操作,也可以像np.sum(plot_df.iloc[:,:i]
一样使用np.sum
来执行此操作.后者当然具有我们可以对任意数量的列求和的优点(就像您希望在函数中使用的那样).
for p3
we want bottom
to start on the sum of plot_df.iloc[:,0]
and plot_df.iloc[:,1]
. We can do this either explicitly or using np.sum
like so np.sum(plot_df.iloc[:,:i]
. The latter of course has the advantage that we can sum over an arbitrary number of columns (like you want in your function).
关于您的功能...我试了一下.您可能必须自己完善它
As for your function... I gave it a shot. You probably have to perfect it yourself
def bar_plot(plot_df):
width = 0.35 # width of bars
p_s = []
p_s.append(plt.bar(ind, plot_df.iloc[:,0], width))
for i in range(1,len(plot_df.columns)):
p_s.append(plt.bar(ind, plot_df.iloc[:,i], width,
bottom=np.sum(plot_df.iloc[:,:i], axis=1)))
plt.ylabel('[%]')
plt.title('Responses by country')
x_ticks_names = tuple([item for item in plot_df.index])
plt.xticks(ind, x_ticks_names)
plt.yticks(np.arange(0, 1.1, 0.1)) # ticks from, to, steps
plt.legend(p_s, plot_df.columns)
plt.show()
这篇关于循环中的堆积条形图,不添加条形的不同成分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!