随时间绘制分类数据计数 [英] Plotting categorical data counts over time

查看:53
本文介绍了随时间绘制分类数据计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 DataFrame (df),其中有一列包含分类数据 (ETH) 和 DateTimeIndex,我想绘制类别 随着时间的推移进行计数(它们按天编入索引,我最好按年绘制它们).

I have a DataFrame (df) with a column containing categorical data (ETH), with a DateTimeIndex, and I'd like to plot the category counts over time (they're indexed by day, and I'd ideally like to plot them by year).

df = pd.DataFrame({
    'County': {
        0: 'Bexar',
        3: 'Nueces',
        4: 'Kerr',
        9: 'Harris',
        13: 'Hidalgo'},
    'Date': {
        0: '2012-10-28 00:00:00',
        3: '2012-04-16 00:00:00',
        4: '2013-09-04 00:00:00',
        9: '2013-01-22 00:00:00',
        13: '2013-09-26 00:00:00'},
    'ETH': {
        0: 'Red',
        3: 'Green',
        4: 'Red',
        9: 'Green',
        13: 'Red'}
})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, infer_datetime_format = True)
df['ETH'] = df['ETH'].astype('category')
df = df.set_index('Date')

但是,即使我知道这应该很简单,groupby或ivot的组合也无法给我任何想要的东西.我似乎无法找到一种标准方法来执行此操作 - 有帮助吗?

However, no combination of groupby or pivot is giving me anything remotely like what I want, even though I know this should be fairly simple. I can't seem to find a standard approach to do this – help?

推荐答案

下面的代码将首先对"ETH"类别进行分组,然后遍历每个组.

The code below will groupby the category 'ETH' first and then iterate over each of the groups.

然后,对于每个组,它使用lambda函数将DataTimeIndex年份分组,并返回该年份的行数.然后绘制这些计数.

For each of the groups it then groups by the DataTimeIndex year using a lambda function, and returns the count of rows in that year. It then plots these counts.

绘制年份时,它将其绘制为数字(而不是日期),这就是 x 轴看起来有点奇怪的原因,您可以将其转换回日期(例如每年的 1 月 1 日)以制作它更漂亮.我已经使用 plt.xlim plt.ylim 稍微调整了限制,以使其更易于查看.

When plotting the year, it plots it as a number (not a date) which is why the x-axis looks a bit strange, you could probably convert it back to a date (say 1st Jan for each year) to make it prettier. I've adjusted the limits a bit using plt.xlim and plt.ylim to make it easier to see.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'County': {
        0: 'Bexar',
        3: 'Nueces',
        4: 'Kerr',
        9: 'Harris',
        13: 'Hidalgo'},
    'Date': {
        0: '2012-10-28 00:00:00',
        3: '2012-04-16 00:00:00',
        4: '2013-09-04 00:00:00',
        9: '2013-01-22 00:00:00',
        13: '2013-09-26 00:00:00'},
    'ETH': {
        0: 'Red',
        3: 'Green',
        4: 'Red',
        9: 'Green',
        13: 'Red'}
})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, infer_datetime_format = True)
df['ETH'] = df['ETH'].astype('category')
df = df.set_index('Date')

grouped = df.groupby('ETH')

for key, group in grouped:
    data = group.groupby(lambda x: x.year).count()
    data['ETH'].plot(label=key)

plt.xlim(2011, 2014)
plt.ylim(0,3)

plt.legend()

plt.show()

是的,我意识到颜色与ETH变量不匹配,因此绿色"用蓝色绘制,而红色"用绿色绘制:P

这篇关于随时间绘制分类数据计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆