R中的滞后回归:确定最佳滞后 [英] Lagged regression in R: determining the optimal lag
问题描述
我有一个变量,被认为可以很好地预测另一个变量,但是有些滞后.我不知道滞后是什么,想从数据中估算出来.
I have a variable that is believed to be a good predictor for another variable, but with some lag. I don't know what the lag is and want to estimate it from the data.
这是一个例子:
library(tidyverse)
data <- tibble(
id = 1:100,
y = dnorm(1:100, 30, 20) * 1000,
x.shifted = y / 10 + runif(100) / 10,
x.actual = lag(x.shifted, 30)
)
data %>%
ggplot(aes(id, x.shifted)) +
geom_point() +
geom_point(aes(id, x.actual), color = 'blue') +
geom_point(aes(id, y), color = 'red')
模型 lm(y〜x.actual,data)
不太合适,但是模型 lm(y〜x.shifted,data)
是.在这里,我知道x必须偏移-30天,但想象一下我没有,我只知道它在-30到+30之间.
The model lm(y ~ x.actual, data)
would not be a great fit, but the model lm(y ~ x.shifted, data)
would be. Here, I know that x must be shifted by -30 days, but imagine I did not and all I knew was that it is between -30 and +30.
我想到的直接方法是运行61个回归模型,从将x偏移-30的模型到将x偏移+30的模型,然后选择具有最佳AIC或BIC的模型.但是(a)这是正确的方法吗?(b)是否已经有R包已经这样做并找到了最佳滞后?
The immediate approach that comes to mind is to run 61 regression models, from one that shifts x by -30 to the one that shifts it by +30, and then pick the model with the best AIC or BIC. However, (a) is this the correct approach, and (b) are there R packages that already do this and find the optimal lag?
推荐答案
您所描述的是两个变量的互相关.您可以在 R 中使用 ccf
轻松完成此操作.
What you are describing is the cross-correlation of the two variables. You can do this very easily in R with ccf
.
但是,为了获得最佳的滞后,我们可以使用 sapply
将所需的滞后次数输入到 cor
函数中,从而简化为单行代码使用 which.max
来找到最高的相关性:
However, to just get the optimum lags, we can simplify to a one-liner by using sapply
to feed the number of required lags into the cor
function, then use which.max
to find the highest correlation:
which.max(sapply(1:50, function(i) cor(data$x.actual, lag(data$y, i), use = "complete")))
#> [1] 30
这篇关于R中的滞后回归:确定最佳滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!