列日期的1年滚动平均 pandas [英] 1 Year Rolling mean pandas on column date

查看：162 发布时间：2020/5/24 2:35:02 python pandas rolling-computation rolling-average

本文介绍了列日期的1年滚动平均 pandas 的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想计算以下数据框上每一行的1年滚动平均值

I would like to compute the 1 year rolling average for each line on the Dataframe below

测试:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2018-03-12  2.854749e-06
2316    7034    2018-03-09  3.907458e-06
2317    7034    2018-03-08  1.662412e-06
2318    7034    2018-03-07  1.346433e-06
2319    7034    2018-03-06  8.731700e-06
2320    7034    2018-03-05  7.145597e-06
2321    7034    2018-03-02  4.893283e-06
...

例如，我需要计算:

2018年3月14日至2017年8月14日之间7034的变化平均值
2018-03-13至2017-08-13之间7034的变化平均值

mean of variation of 7034 between 2018-03-14 and 2017-08-14
mean of variation of 7034 between 2018-03-13 and 2017-08-13

等

我尝试过:

test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()

但是我收到错误消息:

ValueError: invalid on specified as date, must be a column (if DataFrame) or None

这种情况下如何使用熊猫rolling()函数?

How can I use the pandas rolling() function is this case?

谢谢Sacul

我测试过:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']

但是freq='Y'不起作用(我得到了:ValueError: Invalid frequency: Y)然后我使用了window = 365, freq = 'D'.

But freq='Y' doesn't work (I got: ValueError: Invalid frequency: Y) Then I used window = 365, freq = 'D'.

但是还有另一个问题:因为每个夫妇id-date永远不会有365个连续日期，所以结果始终为空.即使缺少日期，我也想忽略它们，并考虑当前日期和当前日期之间的所有日期-365以计算滚动平均值.例如，假设我有:

But there is another issue: because there is never 365 consecutive dates for each couple id-date, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the current date - 365 to compute the rolling mean. for instance, imagine I have:

index   id      date        variation
2313    7034    2018-03-14  4.139148e-06
2314    7034    2018-03-13  4.953194e-07
2315    7034    2017-03-13  2.854749e-06

然后

对于7034 2018-03-14:我想计算MEAN(4.139148e-06,4.953194e-07，2.854749e-06)
对于7034 2018-03-13:我还要计算MEAN(4.139148e-06,4.953194e-07，2.854749e-06)

我该怎么做?

最后，我使用下面的公式通过忽略缺失值来计算1年的滚动中位数，平均值和标准差:

Finaly I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:

pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)

非常感谢您的帮助！

托马斯

推荐答案

我相信这应该对您有用:

I believe this should work for you:

# First make sure that `date` is a datetime object:

df['date'] = pd.to_datetime(df['date'])

df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']

当date是索引时，将与日期时间配合使用效果很好，这就是为什么我使用df.set_index('date')的原因(如

using pd.DataFrame.rolling with datetime works well when the date is the index, which is why I used df.set_index('date') (as can be seen in one of the documentation's examples)

在您的示例数据帧中，我无法真正测试它是否适用于当年平均值，因为只有一年且只有一个ID，但它应该可以工作.

I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.

正如Mihai-Andrei Dinculescu指出的那样，freq现在已被弃用.这是完成您正在寻找的事情的另一种方法(并且可能更适用于未来):

As pointed out by Mihai-Andrei Dinculescu, freq is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:

df.set_index('date').groupby('id')['variation'].resample('A').mean()

您可以查看关于频率参数的此链接.

这篇关于列日期的1年滚动平均 pandas 的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

列日期的1年滚动平均 pandas [英] 1 Year Rolling mean pandas on column date

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

列日期的1年滚动平均 pandas [英] 1 Year Rolling mean pandas on column date

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭