堆叠组件的直方图 [英] Histogram with stacked components
问题描述
比方说,我拥有过去90天内每天测量的值.我想绘制这些值的直方图,但我想让查看者更容易查看过去90天中某些不重叠子集的测量值在哪里累积.我想通过将直方图的每个条细分"成块来做到这一点.一堆用于最早的观测,一堆用于最新的观测,一块用于最新的观测.
Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.
这听起来像是df.plot(kind='bar', stacked=True)
的工作,但我无法正确获取详细信息.
This sounds like a job for df.plot(kind='bar', stacked=True)
but I'm having trouble getting the details right.
这是我到目前为止所拥有的:
Here's what I have so far:
import numpy as np
import pandas as pd
import seaborn as sbn
np.random.seed(0)
data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
'next15': data[70:85].groupby('bin').count()['bin'],
'last5': data[85:].groupby('bin').count()['bin']})
forhist.plot(kind='bar', stacked=True)
那给了我
此图有一些缺点:
- 条形图的堆叠顺序不正确.
last5
应该在顶部,next15
在中间. IE.它们应按forhist
中各列的顺序堆叠. - 横杠之间有水平空间
- x轴用整数标记,而不是表示垃圾箱表示的值的东西.我的第一选择"是将x轴标记为与我刚运行
data['values'].hist()
时相同的标记.我的第二选择"是将x轴标记为如果执行pd.cut(data['values'], 15)
将会得到的"bin名称".在我的代码中,我使用了labels=False
,因为如果不这样做,它将使用bin边缘标签(作为字符串)作为条形标签,并且会将它们按字母顺序排列,从而使该图基本上无用
- The bars are stacked in the wrong order.
last5
should be on top andnext15
in the middle. I.e. they should be stacked in the order of the columns inforhist
. - There is horizontal space between the bars
- The x-axis is labeled with integers rather than something indicative of the values the bins represent. My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran
data['values'].hist()
. My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I didpd.cut(data['values'], 15)
. In my code, I usedlabels=False
because if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.
解决此问题的最佳方法是什么?到目前为止,我感觉自己在使用非常笨拙的功能.
What's the best way to approach this? I feel like I'm using very clumsy functions so far.
推荐答案
好,这是一种使用matplotlib
hist
函数本身的功能进行攻击的方法:
Ok, here's one way to attack it, using features from the matplotlib
hist
function itself:
fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
bins=15,
stacked=True,
rwidth=1.0,
label=['first70', 'next15', 'last5'])
ax.legend()
哪个给:
这篇关于堆叠组件的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!