如何在大 pandas 中按对象分组应用滚动功能 [英] How to apply rolling functions in a group by object in pandas

查看：141 发布时间：2018/5/30 13:58:06 python pandas group-by dataframe apply

本文介绍了如何在大 pandas 中按对象分组应用滚动功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很难解决数据框中或者groupby中的回溯或滚动问题。

以下是数据框的一个简单示例我有：

 水果金额
 20140101苹果3 
 20140102苹果5 
 20140102橙色10 
 20140104香蕉2 
 20140104苹果10 
 20140104橙色4 
 20140105橙色6 
 20140105葡萄1 
 ... 
 20141231苹果3 
 20141231葡萄2

我需要计算以前每个水果'数量'的平均值每天3天，并创建以下数据框：

 水果average_in_last 3天
 20140104苹果4 
 20140104 orange 10 
 ...

例如在20140104，前3天是20140101,20140102和20140103（注意数据帧中的日期不连续且20140103不存在），苹果的平均数量是（3 + 5）/ 2 = 4，橙色是10/1 = 10，其余是0。示例数据框非常简单，但实际的数据框要复杂得多，而且要大得多。希望有人能对此有所了解，请提前致谢！ 解决方案

假设我们在开始时有一个像这样的数据框，

 >>> df 
水果金额
 2017-06-01苹果1 
 2017-06-03苹果16 
 2017-06-04苹果12 
 2017-06-05 apple 8 
 2017-06-06苹果14 
 2017-06-08苹果1 
 2017-06-09苹果4 
 2017-06-02橙色13 
 2017 -06-03橙色9 
 2017-06-04橙色9 
 2017-06-05橙色2 
 2017-06-06橙色11 
 2017-06-07橙色6 
 2017-06-08橙色3 
 2017-06-09橙色3 
 2017-06-10橙色13 
 2017-06-02葡萄14 
 2017- 06-03葡萄16 
 2017-06-07葡萄4 
 2017-06-09葡萄15 
 2017-06-10葡萄5 
 
>> >日期= [i.date（）for i in pd.date_range（'2017-06-01'，'2017-06-10'）] 
 
>>> temp =（df.groupby（'fruit'）['amount'] 
 .apply（lambda x：x.reindex（dates）＃填入每个组的缺失日期）
 .fillna（0 ）＃为每个缺失的组填充0 
 .rolling（3）
 .sum（））＃执行滚动和
 .reset_index（）
 .rename（columns = {'金额'：'sum_of_3_days'，
'level_1'：'date'}））＃重命名日期索引至日期col 
 
 
>>> temp.head（）
水果日期金额
 0苹果2017-06-01 NaN 
 1苹果2017-06-02 NaN 
 2苹果2017-06-03 17.0 
 3苹果2017-06-04 28.0 
 4苹果2017-06-05 36.0 
 
＃将日期索引转换为日期列
>>> df = df.reset_index（）。rename（columns = {'index'：'date'}）
>>> df.merge（temp，on = ['fruit'，'date']）
>>> df 
日期水果金额sum_of_3_days 
 2017-06-01 apple 1 NaN 
 1 2017-06-03 apple 16 17.0 
 2 2017-06-04 apple 12 28.0 
 3 2017-06-05 apple 8 36.0 
 4 2017-06-06 apple 14 34.0 
 5 2017-06-08 apple 1 15.0 
 2017-06-09 apple 4 5.0 
 7 2017-06-02 orange 13 NaN 
 8 2017-06-03 orange 9 22.0 
 9 2017-06-04 orange 9 31.0 
 10 2017-06-05 orange 2 20.0 
 11 2017-06-06橙色11 22.0 
 12 2017-06-07橙色6 19.0 
 13 2017-06-08橙色3 20.0 
 14 2017-06- 09橙色3 12.0 
 15 2017-06-10橙色13 19.0 
 16 2017-06-02葡萄14 NaN 
 17 20 17-06-03葡萄16 30.0 
 18 2017-06-07葡萄4 4.0 
 19 2017-06-09葡萄15 19.0 
 20 2017-06-10葡萄5 20.0

I'm having difficulty to solve a look-back or roll-over problem in dataframe or perhaps in groupby.

The following is a simple example of the dataframe I have:
fruit amount 20140101 apple 3 20140102 apple 5 20140102 orange 10 20140104 banana 2 20140104 apple 10 20140104 orange 4 20140105 orange 6 20140105 grape 1 … 20141231 apple 3 20141231 grape 2
I need to calculate the average value of 'amount' of each fruit in the previous 3 days for everyday, and create the following data frame:
fruit average_in_last 3 days 20140104 apple 4 20140104 orange 10 ...
For example on 20140104, the previous 3 days are 20140101, 20140102 and 20140103 (note the date in the data frame is not continuous and 20140103 does not exist), the average amount of apple is (3+5)/2 = 4 and orange is 10/1=10, the rest is 0.

The sample data frame is very simple but the actual data frame is much more complicated and larger. Hope someone can shed some light on this, thank you in advance!
解决方案
Assuming we have a data frame like that in the beginning,
>>> df fruit amount 2017-06-01 apple 1 2017-06-03 apple 16 2017-06-04 apple 12 2017-06-05 apple 8 2017-06-06 apple 14 2017-06-08 apple 1 2017-06-09 apple 4 2017-06-02 orange 13 2017-06-03 orange 9 2017-06-04 orange 9 2017-06-05 orange 2 2017-06-06 orange 11 2017-06-07 orange 6 2017-06-08 orange 3 2017-06-09 orange 3 2017-06-10 orange 13 2017-06-02 grape 14 2017-06-03 grape 16 2017-06-07 grape 4 2017-06-09 grape 15 2017-06-10 grape 5 >>> dates = [i.date() for i in pd.date_range('2017-06-01', '2017-06-10')] >>> temp = (df.groupby('fruit')['amount'] .apply(lambda x: x.reindex(dates) # fill in the missing dates for each group) .fillna(0) # fill each missing group with 0 .rolling(3) .sum()) # do a rolling sum .reset_index() .rename(columns={'amount': 'sum_of_3_days', 'level_1': 'date'})) # rename date index to date col >>> temp.head() fruit date amount 0 apple 2017-06-01 NaN 1 apple 2017-06-02 NaN 2 apple 2017-06-03 17.0 3 apple 2017-06-04 28.0 4 apple 2017-06-05 36.0 # converts the date index into date column >>> df = df.reset_index().rename(columns={'index': 'date'}) >>> df.merge(temp, on=['fruit', 'date']) >>> df date fruit amount sum_of_3_days 0 2017-06-01 apple 1 NaN 1 2017-06-03 apple 16 17.0 2 2017-06-04 apple 12 28.0 3 2017-06-05 apple 8 36.0 4 2017-06-06 apple 14 34.0 5 2017-06-08 apple 1 15.0 6 2017-06-09 apple 4 5.0 7 2017-06-02 orange 13 NaN 8 2017-06-03 orange 9 22.0 9 2017-06-04 orange 9 31.0 10 2017-06-05 orange 2 20.0 11 2017-06-06 orange 11 22.0 12 2017-06-07 orange 6 19.0 13 2017-06-08 orange 3 20.0 14 2017-06-09 orange 3 12.0 15 2017-06-10 orange 13 19.0 16 2017-06-02 grape 14 NaN 17 2017-06-03 grape 16 30.0 18 2017-06-07 grape 4 4.0 19 2017-06-09 grape 15 19.0 20 2017-06-10 grape 5 20.0

这篇关于如何在大 pandas 中按对象分组应用滚动功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在大 pandas 中按对象分组应用滚动功能 [英] How to apply rolling functions in a group by object in pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在大 pandas 中按对象分组应用滚动功能 [英] How to apply rolling functions in a group by object in pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭