如何在 pandas 数据框中按日期对所有金额求和? [英] How to sum all amounts by date in pandas dataframe?

查看:182
本文介绍了如何在 pandas 数据框中按日期对所有金额求和?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有带有字段last_payoutamount的数据框.我需要汇总每个月的所有amount并绘制输出.

I have dataframe with fields last_payout and amount. I need to sum all amount for each month and plot the output.

df[['last_payout','amount']].dtypes

last_payout    datetime64[ns]
amount           float64
dtype: object

-

df[['last_payout','amount']].head

<bound method NDFrame.head of                last_payout  amount
0      2017-02-14 11:00:06          23401.0
1      2017-02-14 11:00:06          1444.0
2      2017-02-14 11:00:06          0.0
3      2017-02-14 11:00:06          0.0
4      2017-02-14 11:00:06          290083.0

我使用了jezrael的

I used the code from jezrael's answer to plot the number of transactions per month.

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
         .dt.to_period('M')
         .value_counts()
         .sort_index()
         .plot(kind="bar")
)

每月交易数:

我如何求和每个月的所有amount并绘制输出?我应该如何扩展上面的代码来做到这一点?

How do I sum all amount for each month and plot the output? How should I extend the code above for doing this?

我尝试实现.sum,但没有成功.

I tried to implement .sum but didn't succeed.

推荐答案

PeriodIndex 解决方案:

groupby month期间,由 to_period 并汇总sum:

df['amount'].groupby(df['last_payout'].dt.to_period('M')).sum().plot(kind='bar')


DatetimeIndex 解决方案:


DatetimeIndex solutions:

使用 resample month s(M)或月份开始(MS),且总计sum:

Use resample by months (M) or starts of months (MS) with aggregate sum:

s = df.resample('M', on='last_payout')['amount'].sum()
#alternative
#s = df.groupby(pd.Grouper(freq='M', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-28     23401.0
2017-03-31      1444.0
2017-04-30    290083.0
Freq: M, Name: amount, dtype: float64

或者:

s = df.resample('MS', on='last_payout')['amount'].sum()
#s = df.groupby(pd.Grouper(freq='MS', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-01     23401.0
2017-03-01      1444.0
2017-04-01    290083.0
Freq: MS, Name: amount, dtype: float64

然后是必需的格式x标签:

Then is necessary format x labels:

ax = s.plot(kind='bar')
ax.set_xticklabels(s.index.strftime('%Y-%m'))

设置:

import pandas as pd

temp=u"""last_payout,amount
2017-02-14 11:00:06,23401.0
2017-03-14 11:00:06,1444.0
2017-03-14 11:00:06,0.0
2017-04-14 11:00:06,0.0
2017-04-14 11:00:06,290083.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[0])
print (df)
          last_payout    amount
0 2017-02-14 11:00:06   23401.0
1 2017-03-14 11:00:06    1444.0
2 2017-03-14 11:00:06       0.0
3 2017-04-14 11:00:06       0.0
4 2017-04-14 11:00:06  290083.0

这篇关于如何在 pandas 数据框中按日期对所有金额求和?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆