堆叠组件的直方图 [英] Histogram with stacked components

查看:109
本文介绍了堆叠组件的直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我拥有过去90天内每天测量的值.我想绘制这些值的直方图,但我想让查看者更容易查看过去90天中某些不重叠子集的测量值在哪里累积.我想通过将直方图的每个条细分"成块来做到这一点.一堆用于最早的观测,一堆用于最新的观测,一块用于最新的观测.

Let's say that I have a value that I've measured every day for the past 90 days. I would like to plot a histogram of the values, but I want to make it easy for the viewer to see where the measurements have accumulated over certain non-overlapping subsets of the past 90 days. I want to do this by "subdividing" each bar of the histogram into chunks. One chunk for the earliest observations, one for more recent, one for the most recent.

这听起来像是df.plot(kind='bar', stacked=True)的工作,但我无法正确获取详细信息.

This sounds like a job for df.plot(kind='bar', stacked=True) but I'm having trouble getting the details right.

这是我到目前为止所拥有的:

Here's what I have so far:

import numpy as np
import pandas as pd
import seaborn as sbn

np.random.seed(0)

data = pd.DataFrame({'values': np.random.randn(90)})
data['bin'] = pd.cut(data['values'], 15, labels=False)
forhist = pd.DataFrame({'first70': data[:70].groupby('bin').count()['bin'],
                         'next15': data[70:85].groupby('bin').count()['bin'],
                         'last5': data[85:].groupby('bin').count()['bin']})

forhist.plot(kind='bar', stacked=True)

那给了我

此图有一些缺点:

  • 条形图的堆叠顺序不正确. last5应该在顶部,next15在中间. IE.它们应按forhist中各列的顺序堆叠.
  • 横杠之间有水平空间
  • x轴用整数标记,而不是表示垃圾箱表示的值的东西.我的第一选择"是将x轴标记为与我刚运行data['values'].hist()时相同的标记.我的第二选择"是将x轴标记为如果执行pd.cut(data['values'], 15)将会得到的"bin名称".在我的代码中,我使用了labels=False,因为如果不这样做,它将使用bin边缘标签(作为字符串)作为条形标签,并且会将它们按字母顺序排列,从而使该图基本上无用
  • The bars are stacked in the wrong order. last5 should be on top and next15 in the middle. I.e. they should be stacked in the order of the columns in forhist.
  • There is horizontal space between the bars
  • The x-axis is labeled with integers rather than something indicative of the values the bins represent. My "first choice" would be to have the x-axis labelled exactly as it would be if I just ran data['values'].hist(). My "second choice" would be to have the x-axis labelled with the "bin names" that I would get if I did pd.cut(data['values'], 15). In my code, I used labels=False because if I didn't do that, it would have used the bin edge labels (as strings) as the bar labels, and it would have put these in alphabetical order, making the graph basically useless.

解决此问题的最佳方法是什么?到目前为止,我感觉自己在使用非常笨拙的功能.

What's the best way to approach this? I feel like I'm using very clumsy functions so far.

推荐答案

好,这是一种使用matplotlib hist函数本身的功能进行攻击的方法:

Ok, here's one way to attack it, using features from the matplotlib hist function itself:

fig, ax = plt.subplots(1, 1, figsize=(9, 5))
ax.hist([data.ix[low:high, 'values'] for low, high in [(0, 70), (70, 85), (85, 90)]],
         bins=15,
         stacked=True,
         rwidth=1.0,
         label=['first70', 'next15', 'last5'])
ax.legend()

哪个给:

这篇关于堆叠组件的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆