在data.table R中按组滚动 [英] Rolling by group in data.table R
问题描述
我试图按组逐个遍历data.table并遇到问题。不知道应该更改功能还是打错电话。这是简单的示例:
I'm trying to roll my function through data.table by group and run into problems. Not sure should I change my function or is my call wrong. Here is simple example:
数据
test <- data.table(return=c(0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2),
sec=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"))
我的函数
zoo_fun <- function(dt, N) {
(rollapply(dt$return + 1, N, FUN=prod, fill=NA, align='right') - 1)
}
运行它(我想创建新的列动量,它只是每个证券为一个添加的最新3个观察值的乘积(因此,按= sec分组)。
Running it (I want to create new column momentum, which would be just product of latest 3 observations added by one for each security (so grouping by=sec).
test[, momentum3 := zoo_fun(test, 3), by=sec]
Warning messages:
1: In `[.data.table`(test, , `:=`(momentum3, zoo_fun(test, 3)), by = sec) :
RHS 1 is length 10 (greater than the size (5) of group 1). The last 5 element(s) will be discarded.
2: In `[.data.table`(test, , `:=`(momentum3, zoo_fun(test, 3)), by = sec) :
RHS 1 is length 10 (greater than the size (5) of group 2). The last 5 element(s) will be discarded.
我收到警告,但预期结果不正确:
I get that warning and result is not expected:
> test
return sec momentum3
1: 0.1 A NA
2: 0.1 A NA
3: 0.1 A 0.331
4: 0.1 A 0.331
5: 0.1 A 0.331
6: 0.2 B NA
7: 0.2 B NA
8: 0.2 B 0.331
9: 0.2 B 0.331
10: 0.2 B 0.331
我期望B秒用0.728((1.2 * 1.2 * 1.2)-1)填充,其中两个NA开始。我究竟做错了什么?
I was expecting B sec to be filled with 0.728 ((1.2*1.2*1.2) -1) with two NAs in start. What am I doing wrong? Is it that rolling functions won't work with grouping?
推荐答案
使用 dt $ return
整个 data.table
在组内部进行选择。只需在函数定义中使用所需的列即可,它会正常工作:
When you use dt$return
the whole data.table
is picked internally within the groups. Just use the column you need in the function definition and it will work fine:
#use the column instead of the data.table
zoo_fun <- function(column, N) {
(rollapply(column + 1, N, FUN=prod, fill=NA, align='right') - 1)
}
#now it works fine
test[, momentum := zoo_fun(return, 3), by = sec]
作为单独的注释,您可能不应该使用 return
作为列或变量名。
As a separate note, you should probably not use return
as a column or variable name.
出局:
> test
return sec momentum
1: 0.1 A NA
2: 0.1 A NA
3: 0.1 A 0.331
4: 0.1 A 0.331
5: 0.1 A 0.331
6: 0.2 B NA
7: 0.2 B NA
8: 0.2 B 0.728
9: 0.2 B 0.728
10: 0.2 B 0.728
这篇关于在data.table R中按组滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!