滚动平均值随大型数据集的窗口大小而变化 [英] Rolling mean with changing window size on a large dataset
问题描述
我想计算向量上的滚动平均值,由此窗口随向量中的每个条目而增长.基本上,我希望所有元素的均值一直到第 i
个, i + 1
-个, i + 2
-个等等.
I want to compute the rolling mean over a vector whereby the window grows with each entry in the vector. Basically, I want to have the mean of all elements up to the i
-th, i+1
-th, i+2
-th, and so forth.
为了更加清楚,我将提供一个示例和一个解决方案,该示例和解决方案适用于较小的数据集,但无法很好地扩展:
To make it more clear, I'll provide an example and a solution which works for smaller datasets but does not scale up well:
library(zoo)
# data:
x <- 1:100
# solution:
rolling_average <- rollapply(x, seq_along(x), mean, align = "right")
# result:
rolling_average
# [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
# [27] 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5
# [53] 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5
# [79] 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5
将这种方法用于具有500000个条目的向量,将在几秒钟内填满我的内存,并使我的PC无法使用.另外,我尝试使用 RcppRoll
中的 roll_mean
,但由于 RcppRoll :: roll_mean
仅接受,因此无法提出解决方案整数作为窗口长度.
Using this approach for a vector with 500000 entries fills up my memory within seconds and renders my PC unusable. Alternatively, I've tried using roll_mean
from RcppRoll
, but wasn't able to come up with a solution because RcppRoll::roll_mean
only accepts integers as window lengths.
那么,大规模解决此问题的最佳方法是什么?任何帮助,我们将不胜感激.
So, what is the best approach to solve this problem on a large scale? Any help is greatly appreciated.
推荐答案
我们可以做到
cumsum(x) / seq_along(x)
# [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5
# [21] 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5
# [41] 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5
# [61] 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5
# [81] 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5
这篇关于滚动平均值随大型数据集的窗口大小而变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!