用dplyr滚动回归 [英] rolling regression with dplyr

查看:107
本文介绍了用dplyr滚动回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框为日期,公司和返回,可以通过以下代码重现:

I have a dataframe of "date", "company" and "return", reproducible by the code below:

library(dplyr)
n.dates <- 60
n.stocks <- 2
date <- seq(as.Date("2011-07-01"), by=1, len=n.dates)
symbol <- replicate(n.stocks, paste0(sample(LETTERS, 5), collapse = ""))
x <- expand.grid(date, symbol)
x$return <- rnorm(n.dates*n.stocks, 0, sd = 0.05)
names(x) <- c("date", "company", "return")

使用这个数据框,我可以计算每日市场平均回报率,并将该结果添加到新列market.ret。

With this dataframe, I can calculate the daily market average return and add that result into a new column "market.ret".

x <- group_by(x, date)    
x <- mutate(x, market.ret = mean(x$return, na.rm = TRUE))

现在我想把不同公司的所有数据分组(在这种情况下为2)。

Now I want to group all my data by different companies (2 in this case).

x <- group_by(x, company)

这样做之后,我想适应returnmarket.r et,并计算线性回归系数,并将斜率存储在新列中。如果我想为给定公司的整个数据集进行拟合,那么我可以简单地调用lm():

After doing this, I would like to fit "return" by "market.ret" and calculate the linear regression coefficient and store the slopes in a new column. If I want to do the fitting for the whole data set within a given company, then I can simply call lm():

group_by(x, company) %>%
do(data.frame(beta = coef(lm(return ~ market.ret,data = .))[2])) %>%
left_join(x,.)

然而,我实际上想在滚动基础,即每天在20天的拖尾期间分开。我想使用rollapply(),但不知道如何将两列传递到函数中。非常感谢任何帮助或建议。

However, I actually want to do the linear regression on a "rolling" basis, i.e. for each day separately over a 20-day trailing period. I want to use rollapply() but do not know how to pass two columns into the function. Any help or suggestion is greatly appreciated.

注意:以下是用于计算20天滚动标准偏差的代码,这可能有帮助:

Note: Below is the code that I used for calculating 20-day rolling standard deviation of returns which might be helpful:

sdnoNA <- function(x){return(sd(x, na.rm = TRUE))}
x <- mutate(x, sd.20.0.d = rollapply(return, FUN = sdnoNA, width = 20, fill = NA))


推荐答案

## lms is a function which calculate the linear regression coefficient
lms <- function(y, x){
s = which(is.finite(x * y))
y = y[s]
x = x[s]
return(cov(x, y)/var(x))
}

## z is a dataframe which stores our final result
z <- data.frame()

## x has to be ungrouped
x <- ungroup(x)

## subset with "filter" and roll with "rollapply"
symbols <- unique(x$company)
for(i in 1:length(symbols)){
temp <- filter(x, company == symbols[i])
z <- rbind(z, mutate(temp, beta = rollapply(temp[, c(3, 4)], 
                                          FUN = function(x) lms(x[, 1], x[, 2]),
                                          width = 20, fill = NA,
                                          by.column = FALSE, align = "right")))
}

## final result
print(z)

这篇关于用dplyr滚动回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆