滚动平均值随大型数据集的窗口大小而变化 [英] Rolling mean with changing window size on a large dataset

查看:58
本文介绍了滚动平均值随大型数据集的窗口大小而变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算向量上的滚动平均值,由此窗口随向量中的每个条目而增长.基本上,我希望所有元素的均值一直到第 i 个, i + 1 -个, i + 2 -个等等.

I want to compute the rolling mean over a vector whereby the window grows with each entry in the vector. Basically, I want to have the mean of all elements up to the i-th, i+1-th, i+2-th, and so forth.

为了更加清楚,我将提供一个示例和一个解决方案,该示例和解决方案适用于较小的数据集,但无法很好地扩展:

To make it more clear, I'll provide an example and a solution which works for smaller datasets but does not scale up well:

library(zoo)

# data:
x <- 1:100

# solution:
rolling_average <- rollapply(x, seq_along(x), mean, align = "right")

# result:
rolling_average
# [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
# [27] 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5
# [53] 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5
# [79] 40.0 40.5 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5

将这种方法用于具有500000个条目的向量,将在几秒钟内填满我的内存,并使我的PC无法使用.另外,我尝试使用 RcppRoll 中的 roll_mean ,但由于 RcppRoll :: roll_mean 仅接受,因此无法提出解决方案整数作为窗口长度.

Using this approach for a vector with 500000 entries fills up my memory within seconds and renders my PC unusable. Alternatively, I've tried using roll_mean from RcppRoll, but wasn't able to come up with a solution because RcppRoll::roll_mean only accepts integers as window lengths.

那么,大规模解决此问题的最佳方法是什么?任何帮助,我们将不胜感激.

So, what is the best approach to solve this problem on a large scale? Any help is greatly appreciated.

推荐答案

我们可以做到

cumsum(x) / seq_along(x)
#  [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0 10.5
# [21] 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5
# [41] 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5 26.0 26.5 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5
# [61] 31.0 31.5 32.0 32.5 33.0 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5
# [81] 41.0 41.5 42.0 42.5 43.0 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0 50.5

这篇关于滚动平均值随大型数据集的窗口大小而变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆