随时间绘制分类数据计数 [英] Plotting categorical data counts over time
问题描述
我有一个 DataFrame (df
),其中有一列包含分类数据 (ETH
) 和 DateTimeIndex,我想绘制类别 随着时间的推移进行计数(它们按天编入索引,我最好按年绘制它们).
I have a DataFrame (df
) with a column containing categorical data (ETH
), with a DateTimeIndex, and I'd like to plot the category counts over time (they're indexed by day, and I'd ideally like to plot them by year).
df = pd.DataFrame({
'County': {
0: 'Bexar',
3: 'Nueces',
4: 'Kerr',
9: 'Harris',
13: 'Hidalgo'},
'Date': {
0: '2012-10-28 00:00:00',
3: '2012-04-16 00:00:00',
4: '2013-09-04 00:00:00',
9: '2013-01-22 00:00:00',
13: '2013-09-26 00:00:00'},
'ETH': {
0: 'Red',
3: 'Green',
4: 'Red',
9: 'Green',
13: 'Red'}
})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, infer_datetime_format = True)
df['ETH'] = df['ETH'].astype('category')
df = df.set_index('Date')
但是,即使我知道这应该很简单,groupby或ivot的组合也无法给我任何想要的东西.我似乎无法找到一种标准方法来执行此操作 - 有帮助吗?
However, no combination of groupby or pivot is giving me anything remotely like what I want, even though I know this should be fairly simple. I can't seem to find a standard approach to do this – help?
推荐答案
下面的代码将首先对"ETH"类别进行分组,然后遍历每个组.
The code below will groupby the category 'ETH' first and then iterate over each of the groups.
然后,对于每个组,它使用lambda函数将DataTimeIndex年份分组,并返回该年份的行数.然后绘制这些计数.
For each of the groups it then groups by the DataTimeIndex year using a lambda function, and returns the count of rows in that year. It then plots these counts.
绘制年份时,它将其绘制为数字(而不是日期),这就是 x 轴看起来有点奇怪的原因,您可以将其转换回日期(例如每年的 1 月 1 日)以制作它更漂亮.我已经使用 plt.xlim
和 plt.ylim
稍微调整了限制,以使其更易于查看.
When plotting the year, it plots it as a number (not a date) which is why the x-axis looks a bit strange, you could probably convert it back to a date (say 1st Jan for each year) to make it prettier. I've adjusted the limits a bit using plt.xlim
and plt.ylim
to make it easier to see.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'County': {
0: 'Bexar',
3: 'Nueces',
4: 'Kerr',
9: 'Harris',
13: 'Hidalgo'},
'Date': {
0: '2012-10-28 00:00:00',
3: '2012-04-16 00:00:00',
4: '2013-09-04 00:00:00',
9: '2013-01-22 00:00:00',
13: '2013-09-26 00:00:00'},
'ETH': {
0: 'Red',
3: 'Green',
4: 'Red',
9: 'Green',
13: 'Red'}
})
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, infer_datetime_format = True)
df['ETH'] = df['ETH'].astype('category')
df = df.set_index('Date')
grouped = df.groupby('ETH')
for key, group in grouped:
data = group.groupby(lambda x: x.year).count()
data['ETH'].plot(label=key)
plt.xlim(2011, 2014)
plt.ylim(0,3)
plt.legend()
plt.show()
是的,我意识到颜色与ETH变量不匹配,因此绿色"用蓝色绘制,而红色"用绿色绘制:P
这篇关于随时间绘制分类数据计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!