如何从 pandas 数据框中聚合和绘制数据? [英] How to aggregate and plot data from pandas dataframe?
问题描述
我有这个数据框
df[['payout_date','total_value']].head(10)payout_date total_value0 2017-02-14T11:00:06 177.3131 2017-02-14T11:00:06 0.0002 2017-02-01T00:00:00 0.0003 2017-02-14T11:00:06 47.3924 2017-02-14T11:00:06 16.2545 2017-02-14T11:00:06 125.8186 2017-02-14T11:00:06 0.0007 2017-02-14T11:00:06 0.0008 2017-02-14T11:00:06 0.0009 2017-02-14T11:00:06 0.000
我使用此代码在特定日期范围内按天(和按月)绘制 total_value
的总和,但它为每个 total_value
绘制了一个条形图并且不会按天汇总 total_value
.
(df.set_index('payout_date').loc['2018-02-01':'2018-02-02'].groupby('payout_date').agg(['sum']).reset_index().plot(x='payout_date', y='total_value',kind="bar"))plt.show()
数据未聚合,我从 df 中获取每个值的 bar:
如何按日期和月份汇总total_value
?
我尝试使用
如果您想将其应用于子集,您可以执行以下操作:
tmp = df.loc[(df.payout_date > '2017-02-01') &(df.payout_date < '2017-02-15')]tmp.groupby(pd.DatetimeIndex(tmp.payout_date) \.normalize().strftime('%Y-%m-%d'))['total_value'] \.agg(['sum'])# 结果和2017-02-01 199.3132017-02-02 25.0002017-02-14 63.646
这只会总结您想要的范围.
I have this dataframe
df[['payout_date','total_value']].head(10)
payout_date total_value
0 2017-02-14T11:00:06 177.313
1 2017-02-14T11:00:06 0.000
2 2017-02-01T00:00:00 0.000
3 2017-02-14T11:00:06 47.392
4 2017-02-14T11:00:06 16.254
5 2017-02-14T11:00:06 125.818
6 2017-02-14T11:00:06 0.000
7 2017-02-14T11:00:06 0.000
8 2017-02-14T11:00:06 0.000
9 2017-02-14T11:00:06 0.000
I am using this code to plot the aggregated sum of total_value
within specific date-range by day (and by month), but it plots a bar for each total_value
and doesn't sum-aggregate total_value
by day.
(df.set_index('payout_date')
.loc['2018-02-01':'2018-02-02']
.groupby('payout_date')
.agg(['sum'])
.reset_index()
.plot(x='payout_date', y='total_value',kind="bar"))
plt.show()
Data is not aggregated, I get bar for each value from df:
How to aggregate total_value
by date and by month?
I tried to use answers from this and couple other similar questions but none of them worked for the date format that is used here.
I also tried adding .dt.to_period('M')
to the code but I get TypeError: Empty 'DataFrame': no numeric data to plot
error.
Setup
df = pd.DataFrame({'payout_date': {0: '2017-02-01T11:00:06', 1: '2017-02-01T11:00:06', 2: '2017-02-02T00:00:00', 3: '2017-02-14T11:00:06', 4: '2017-02-14T11:00:06', 5: '2017-02-15T11:00:06', 6: '2017-02-15T11:00:06', 7: '2017-02-16T11:00:06', 8: '2017-02-16T11:00:06', 9: '2017-02-16T11:00:06'}, 'total_value':{0: 177.313, 1: 22.0, 2: 25.0, 3: 47.391999999999996, 4: 16.254, 5: 125.818, 6: 85.0, 7: 42.0,8: 22.0, 9: 19.0}})
Use normalize
to just group by day:
df.groupby(pd.DatetimeIndex(df.payout_date).normalize()).sum().reset_index()
payout_date total_value
0 2017-02-01 199.313
1 2017-02-02 48.000
2 2017-02-14 63.646
3 2017-02-15 210.818
4 2017-02-16 83.000
Extend the previous command to plot:
df.groupby(
pd.DatetimeIndex(df.payout_date) \
.normalize().strftime('%Y-%m-%d')) \
.agg(['sum']) \
.reset_index() \
.plot(x='index', y='total_value', kind='bar')
plt.tight_layout()
plt.show()
Output for my sample data:
If you want to apply this on a subset, you can do something like the following:
tmp = df.loc[(df.payout_date > '2017-02-01') & (df.payout_date < '2017-02-15')]
tmp.groupby(
pd.DatetimeIndex(tmp.payout_date) \
.normalize().strftime('%Y-%m-%d'))['total_value'] \
.agg(['sum'])
# Result
sum
2017-02-01 199.313
2017-02-02 25.000
2017-02-14 63.646
Which will only sum your desired range.
这篇关于如何从 pandas 数据框中聚合和绘制数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!