如何在 pandas 数据框中按日期对所有金额求和? [英] How to sum all amounts by date in pandas dataframe?
问题描述
我有带有字段last_payout
和amount
的数据框.我需要汇总每个月的所有amount
并绘制输出.
I have dataframe with fields last_payout
and amount
. I need to sum all amount
for each month and plot the output.
df[['last_payout','amount']].dtypes
last_payout datetime64[ns]
amount float64
dtype: object
-
df[['last_payout','amount']].head
<bound method NDFrame.head of last_payout amount
0 2017-02-14 11:00:06 23401.0
1 2017-02-14 11:00:06 1444.0
2 2017-02-14 11:00:06 0.0
3 2017-02-14 11:00:06 0.0
4 2017-02-14 11:00:06 290083.0
I used the code from jezrael's answer to plot the number of transactions per month.
(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
.dt.to_period('M')
.value_counts()
.sort_index()
.plot(kind="bar")
)
每月交易数:
我如何求和每个月的所有amount
并绘制输出?我应该如何扩展上面的代码来做到这一点?
How do I sum all amount
for each month and plot the output? How should I extend the code above for doing this?
我尝试实现.sum
,但没有成功.
I tried to implement .sum
but didn't succeed.
推荐答案
PeriodIndex 解决方案:
groupby
由month
期间,由 to_period
并汇总sum
:
df['amount'].groupby(df['last_payout'].dt.to_period('M')).sum().plot(kind='bar')
DatetimeIndex 解决方案:
DatetimeIndex solutions:
使用 resample
month
s(M
)或月份开始(MS
),且总计sum
:
Use resample
by month
s (M
) or starts of months (MS
) with aggregate sum
:
s = df.resample('M', on='last_payout')['amount'].sum()
#alternative
#s = df.groupby(pd.Grouper(freq='M', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-28 23401.0
2017-03-31 1444.0
2017-04-30 290083.0
Freq: M, Name: amount, dtype: float64
或者:
s = df.resample('MS', on='last_payout')['amount'].sum()
#s = df.groupby(pd.Grouper(freq='MS', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-01 23401.0
2017-03-01 1444.0
2017-04-01 290083.0
Freq: MS, Name: amount, dtype: float64
然后是必需的格式x
标签:
Then is necessary format x
labels:
ax = s.plot(kind='bar')
ax.set_xticklabels(s.index.strftime('%Y-%m'))
设置:
import pandas as pd
temp=u"""last_payout,amount
2017-02-14 11:00:06,23401.0
2017-03-14 11:00:06,1444.0
2017-03-14 11:00:06,0.0
2017-04-14 11:00:06,0.0
2017-04-14 11:00:06,290083.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[0])
print (df)
last_payout amount
0 2017-02-14 11:00:06 23401.0
1 2017-03-14 11:00:06 1444.0
2 2017-03-14 11:00:06 0.0
3 2017-04-14 11:00:06 0.0
4 2017-04-14 11:00:06 290083.0
这篇关于如何在 pandas 数据框中按日期对所有金额求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!