列日期的1年滚动平均 pandas [英] 1 Year Rolling mean pandas on column date

查看:162
本文介绍了列日期的1年滚动平均 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算以下数据框上每一行的1年滚动平均值

I would like to compute the 1 year rolling average for each line on the Dataframe below

测试:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2018-03-12  2.854749e-06
2316    7034    2018-03-09  3.907458e-06
2317    7034    2018-03-08  1.662412e-06
2318    7034    2018-03-07  1.346433e-06
2319    7034    2018-03-06  8.731700e-06
2320    7034    2018-03-05  7.145597e-06
2321    7034    2018-03-02  4.893283e-06
...

例如,我需要计算:

  • 2018年3月14日至2017年8月14日之间7034的变化平均值
  • 2018-03-13至2017-08-13之间7034的变化平均值

  • mean of variation of 7034 between 2018-03-14 and 2017-08-14
  • mean of variation of 7034 between 2018-03-13 and 2017-08-13

我尝试过:

test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()

但是我收到错误消息:

ValueError: invalid on specified as date, must be a column (if DataFrame) or None

这种情况下如何使用熊猫rolling()函数?

How can I use the pandas rolling() function is this case?

谢谢Sacul

我测试过:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']

但是freq='Y'不起作用(我得到了:ValueError: Invalid frequency: Y)然后我使用了window = 365, freq = 'D'.

But freq='Y' doesn't work (I got: ValueError: Invalid frequency: Y) Then I used window = 365, freq = 'D'.

但是还有另一个问题:因为每个夫妇id-date永远不会有365个连续日期,所以结果始终为空.即使缺少日期,我也想忽略它们,并考​​虑当前日期和当前日期之间的所有日期-365以计算滚动平均值.例如,假设我有:

But there is another issue: because there is never 365 consecutive dates for each couple id-date, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the current date - 365 to compute the rolling mean. for instance, imagine I have:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2017-03-13  2.854749e-06

然后

  • 对于7034 2018-03-14:我想计算MEAN(4.139148e-06,4.953194e-07,2.854749e-06)
  • 对于7034 2018-03-13:我还要计算MEAN(4.139148e-06,4.953194e-07,2.854749e-06)

我该怎么做?

最后,我使用下面的公式通过忽略缺失值来计算1年的滚动中位数,平均值和标准差:

Finaly I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:

pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

非常感谢您的帮助!

托马斯

推荐答案

我相信这应该对您有用:

I believe this should work for you:

# First make sure that `date` is a datetime object:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']

date是索引时,将与日期时间配合使用效果很好,这就是为什么我使用df.set_index('date')的原因(如

using pd.DataFrame.rolling with datetime works well when the date is the index, which is why I used df.set_index('date') (as can be seen in one of the documentation's examples)

在您的示例数据帧中,我无法真正测试它是否适用于当年平均值,因为只有一年且只有一个ID,但它应该可以工作.

I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.

正如Mihai-Andrei Dinculescu指出的那样,freq现在已被弃用.这是完成您正在寻找的事情的另一种方法(并且可能更适用于未来):

As pointed out by Mihai-Andrei Dinculescu, freq is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:

df.set_index('date').groupby('id')['variation'].resample('A').mean()

您可以查看关于频率参数的此链接.

这篇关于列日期的1年滚动平均 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆