将滞后变量添加到lm模型? [英] Adding lagged variables to an lm model?

查看:179
本文介绍了将滞后变量添加到lm模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个时间序列上使用lm,它实际上工作得很好,而且超级快.

I'm using lm on a time series, which works quite well actually, and it's super super fast.

假设我的模型是:

> formula <- y ~ x

我在一个训练场上对此进行训练:

I train this on a training set:

> train <- data.frame( x = seq(1,3), y = c(2,1,4) )
> model <- lm( formula, train )

...,我可以对新数据进行预测:

... and I can make predictions for new data:

> test <- data.frame( x = seq(4,6) )
> test$y <- predict( model, newdata = test )
> test
  x        y
1 4 4.333333
2 5 5.333333
3 6 6.333333

这很好用,而且速度很快.

This works super nicely, and it's really speedy.

我想向模型添加滞后变量.现在,我可以通过增加原始训练集来做到这一点:

I want to add lagged variables to the model. Now, I could do this by augmenting my original training set:

> train$y_1 <- c(0,train$y[1:nrow(train)-1])
> train
  x y y_1
1 1 2   0
2 2 1   2
3 3 4   1

更新公式:

formula <- y ~ x * y_1

...并且培训将正常进行:

... and training will work just fine:

> model <- lm( formula, train )
> # no errors here

但是,问题在于无法使用'predict',因为无法以批量方式在测试集中填充y_1.

However, the problem is that there is no way of using 'predict', because there is no way of populating y_1 in a test set in a batch manner.

现在,对于许多其他回归事物,有非常方便的方法可以在公式中表达它们,例如poly(x,2)等,并且这些方法可以直接使用未经修改的训练和测试数据来工作.

Now, for lots of other regression things, there are very convenient ways to express them in the formula, such as poly(x,2) and so on, and these work directly using the unmodified training and test data.

所以,我想知道公式中是否存在某种表达滞后变量的方法,以便可以使用predict?理想情况下:

So, I'm wondering if there is some way of expressing lagged variables in the formula, so that predict can be used? Ideally:

formula <- y ~ x * lag(y,-1)
model <- lm( formula, train )
test$y <- predict( model, newdata = test )

...无需增加(不确定是否正确)训练和测试数据集,而能够直接使用predict吗?

... without having to augment (not sure if that's the right word) the training and test datasets, and just being able to use predict directly?

推荐答案

例如 dynlm 软件包,可为您提供滞后运算符.一般来说,计量经济学和时间序列的任务视图"将为您提供更多内容.

Have a look at e.g. the dynlm package which gives you lag operators. More generally the Task Views on Econometrics and Time Series will have lots more for you to look at.

这是其示例的开始-滞后一个月和十二个月:

Here is the beginning of its examples -- a one and twelve month lag:

R>      data("UKDriverDeaths", package = "datasets")
R>      uk <- log10(UKDriverDeaths)
R>      dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12))
R>      dfm

Time series regression with "ts" data:
Start = 1970(1), End = 1984(12)

Call:
dynlm(formula = uk ~ L(uk, 1) + L(uk, 12))

Coefficients:
(Intercept)     L(uk, 1)    L(uk, 12)  
      0.183        0.431        0.511  

R> 

这篇关于将滞后变量添加到lm模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆