预报()和新数据-这是如何工作的? [英] predict() and newdata - How does this work?
问题描述
Someone recently posted a question on this paper here: https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf
论文的R代码位于论文的最后.本质上,本文通过搜索查询调查了一个月的销售预测.我想我了解模型和方法,但是有一个细节使我感到困惑.这是一部分:
The R code of the paper can be found at the very end of the paper. Essentially, the paper investigates one-month ahead predictions of sales through search queries. I think I understood the model and method, but there's one detail that puzzles me. It's the part:
1 ##### Divide data by two parts - model fitting & prediction
dat1 = mdat[1:(nrow(mdat)-1), ]
dat2 = mdat[nrow(mdat), ]
2 ##### Fit Model;
fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1);
summary(fit)
和:
3 #### Prediction for the next month;
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE);
我确实知道,(1)中的dat2
只是mdat
中的最后一行. (2)表示将回归模型应用于除数据集中最后一行以外的所有内容.
I do understand, that dat2
in (1) is only the last row from mdat
. (2) means that the regression model is applied to everything but the last row in the dataset.
但是为什么在(3)的预测模型中使用newdata=dat2
?它是什么意思?为什么只有最后一行?
But why is newdata=dat2
in the prediction model of (3) being used and what does it mean? Why the last row only?
推荐答案
以下是每行代码的说明:
Here is a description for each line of the code:
dat1 = mdat[1:(nrow(mdat)-1), ]
创建整个数据集的子集,其中包含除最后一行以外的所有内容.
Creates a subset of the whole dataset which contains all but the last row.
dat2 = mdat[nrow(mdat), ]
创建整个数据集的子集,其中仅包含最后一行.
Creates a subset of the whole dataset which contains only the last row.
fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1)
对于模型拟合,仅使用第一个子集dat1
.因此数据没有最后一行.
For the model fitting is only the first subset dat1
used. So the data without the last row.
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE)
predict
采用拟合的模型,并查看其对于看不见"数据dat2
的预测.
predict
takes the fitted model and looks what it would predict for the "unseen" data dat2
.
在最简单的情况下,只有一个自变量,我们将对dat1
拟合一条线,然后查看对于dat2
的X值可以预测哪个Y值.
In the easiest case with only one independent variable we would fit a line to dat1
and then look which Y-value would be predicted for the X-value of dat2
.
这篇关于预报()和新数据-这是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!