使用自适应窗口长度在data.table中计算滚动平均值 [英] Computing rolling mean in data.table with adaptive window lengths
问题描述
我希望在具有自适应窗口的data.table中按组计算移动平均值,以便在时间序列开始时没有NA.我知道如何使用frollmean并设置adaptive = TRUE(例如,请参见
I am looking to compute a moving average by group in a data.table with an adaptive window so that there are no NAs at the beginning of the time series. I know how to do this with frollmean and setting adaptive = TRUE (see for instance jangorecki's response in this thread). I can get the same code to work when all groups in my data.table are of the same length but run into errors when the groups are of different size.
例如,如果我的数据是
tmp = data.table(Gp = c(rep('A',6),rep('B',4)), Val = c(1,3,4,6,2,2,8,5,7,10))
我正在做长度为3的移动平均线,那么所需的响应为
and I am doing a moving average of length 3, then the desired response is
> desired_output
Gp Val
1: A 1.00
2: A 2.00
3: A 2.67
4: A 4.33
5: A 4.00
6: A 3.33
7: B 8.00
8: B 6.50
9: B 6.67
10: B 7.33
我尝试了以下操作:
mov_window_len = vector("list",2)
mov_window_len[[1]] = c(1,2,rep(3,4))
mov_window_len[[2]] = c(1,2,rep(3,2))
tmp[,lapply(.SD, frollmean, n = mov_window_len, align = "right", adaptive = TRUE), by = Gp]
但是我收到一个错误消息,说作为"n"个参数的列表提供的整数矢量的长度必须等于"x"中提供的观测值的数量
but I get an error saying length of integer vector(s) provided as list to 'n' argument must be equal to number of observations provided in 'x'
任何解决此问题的帮助将不胜感激.预先感谢.
Any help in resolving this will be much appreciated. Thanks in advance.
推荐答案
您可以使用组索引 .GRP
来子集 mov_window_len
.这将为您提供适合每个组的长度.您只想获取 Val
的 frollmean
,因此不需要 lapply
.
You can use the group index .GRP
to subset mov_window_len
. This will give you the right lengths for each group. You only want to take frollmean
of Val
, so no need for lapply
.
tmp[, frollmean(Val, n = mov_window_len[.GRP], align = "right", adaptive = TRUE), by = Gp]
# Gp V1
# 1: A 1.000000
# 2: A 2.000000
# 3: A 2.666667
# 4: A 4.333333
# 5: A 4.000000
# 6: A 3.333333
# 7: B 8.000000
# 8: B 6.500000
# 9: B 6.666667
# 10: B 7.333333
或者,可以将窗口长度添加到输入data.table(下面的 Len
字段),因为它对应于每一行.
Alternatively window length can be added to input data.table (Len
field below), as it corresponds to each row.
tmp[Gp=="A", Len:=mov_window_len[[1]]
][Gp=="B", Len:=mov_window_len[[2]]
][, .(Val, Len, RollVal=frollmean(Val, Len, adaptive=TRUE)), by=Gp]
# Gp Val Len RollVal
# 1: A 1 1 1.000000
# 2: A 3 2 2.000000
# 3: A 4 3 2.666667
# 4: A 6 3 4.333333
# 5: A 2 3 4.000000
# 6: A 2 3 3.333333
# 7: B 8 1 8.000000
# 8: B 5 2 6.500000
# 9: B 7 3 6.666667
#10: B 10 3 7.333333
这篇关于使用自适应窗口长度在data.table中计算滚动平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!