如何从 pandas 数据框中聚合和绘制数据? [英] How to aggregate and plot data from pandas dataframe?

查看:48
本文介绍了如何从 pandas 数据框中聚合和绘制数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框

df[['payout_date','total_value']].head(10)payout_date total_value0 2017-02-14T11:00:06 177.3131 2017-02-14T11:00:06 0.0002 2017-02-01T00:00:00 0.0003 2017-02-14T11:00:06 47.3924 2017-02-14T11:00:06 16.2545 2017-02-14T11:00:06 125.8186 2017-02-14T11:00:06 0.0007 2017-02-14T11:00:06 0.0008 2017-02-14T11:00:06 0.0009 2017-02-14T11:00:06 0.000

我使用此代码在特定日期范围内按天(和按月)绘制 total_value 的总和,但它为每个 total_value 绘制了一个条形图并且不会按天汇总 total_value.

(df.set_index('payout_date').loc['2018-02-01':'2018-02-02'].groupby('payout_date').agg(['sum']).reset_index().plot(x='payout_date', y='total_value',kind="bar"))plt.show()

数据未聚合,我从 df 中获取每个值的 bar:

如何按日期和月份汇总total_value?

我尝试使用

如果您想将其应用于子集,您可以执行以下操作:

tmp = df.loc[(df.payout_date > '2017-02-01') &(df.payout_date < '2017-02-15')]tmp.groupby(pd.DatetimeIndex(tmp.payout_date) \.normalize().strftime('%Y-%m-%d'))['total_value'] \.agg(['sum'])# 结果和2017-02-01 199.3132017-02-02 25.0002017-02-14 63.646

这只会总结您想要的范围.

I have this dataframe

df[['payout_date','total_value']].head(10)

    payout_date         total_value
0   2017-02-14T11:00:06  177.313
1   2017-02-14T11:00:06  0.000
2   2017-02-01T00:00:00  0.000
3   2017-02-14T11:00:06  47.392
4   2017-02-14T11:00:06  16.254
5   2017-02-14T11:00:06  125.818
6   2017-02-14T11:00:06  0.000
7   2017-02-14T11:00:06  0.000
8   2017-02-14T11:00:06  0.000
9   2017-02-14T11:00:06  0.000

I am using this code to plot the aggregated sum of total_value within specific date-range by day (and by month), but it plots a bar for each total_value and doesn't sum-aggregate total_value by day.

(df.set_index('payout_date')
                    .loc['2018-02-01':'2018-02-02']
                    .groupby('payout_date')
                    .agg(['sum'])
                    .reset_index()
                    .plot(x='payout_date', y='total_value',kind="bar"))
plt.show()

Data is not aggregated, I get bar for each value from df:

How to aggregate total_value by date and by month?

I tried to use answers from this and couple other similar questions but none of them worked for the date format that is used here.

I also tried adding .dt.to_period('M') to the code but I get TypeError: Empty 'DataFrame': no numeric data to plot error.

解决方案

Setup

df = pd.DataFrame({'payout_date': {0: '2017-02-01T11:00:06',   1: '2017-02-01T11:00:06',   2: '2017-02-02T00:00:00',   3: '2017-02-14T11:00:06',   4: '2017-02-14T11:00:06',   5: '2017-02-15T11:00:06',   6: '2017-02-15T11:00:06',   7: '2017-02-16T11:00:06',   8: '2017-02-16T11:00:06',   9: '2017-02-16T11:00:06'},  'total_value':{0: 177.313,   1: 22.0,   2: 25.0,   3: 47.391999999999996,   4: 16.254,   5: 125.818,   6: 85.0,   7: 42.0,8: 22.0,   9: 19.0}})

Use normalize to just group by day:

df.groupby(pd.DatetimeIndex(df.payout_date).normalize()).sum().reset_index()

  payout_date  total_value
0  2017-02-01      199.313
1  2017-02-02       48.000
2  2017-02-14       63.646
3  2017-02-15      210.818
4  2017-02-16       83.000

Extend the previous command to plot:

df.groupby(
    pd.DatetimeIndex(df.payout_date)      \
    .normalize().strftime('%Y-%m-%d'))    \
    .agg(['sum'])                         \
    .reset_index()                        \
    .plot(x='index', y='total_value', kind='bar')

plt.tight_layout()
plt.show()

Output for my sample data:

If you want to apply this on a subset, you can do something like the following:

tmp = df.loc[(df.payout_date > '2017-02-01') & (df.payout_date < '2017-02-15')]

tmp.groupby(
    pd.DatetimeIndex(tmp.payout_date)                     \
    .normalize().strftime('%Y-%m-%d'))['total_value']     \
    .agg(['sum'])

# Result
                sum
2017-02-01  199.313
2017-02-02   25.000
2017-02-14   63.646

Which will only sum your desired range.

这篇关于如何从 pandas 数据框中聚合和绘制数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆