预报()和新数据-这是如何工作的? [英] predict() and newdata - How does this work?

查看:82
本文介绍了预报()和新数据-这是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近有人在此纸上发表了一个问题:

Someone recently posted a question on this paper here: https://static.googleusercontent.com/media/www.google.com/en//googleblogs/pdfs/google_predicting_the_present.pdf

论文的R代码位于论文的最后.本质上,本文通过搜索查询调查了一个月的销售预测.我想我了解模型和方法,但是有一个细节使我感到困惑.这是一部分:

The R code of the paper can be found at the very end of the paper. Essentially, the paper investigates one-month ahead predictions of sales through search queries. I think I understood the model and method, but there's one detail that puzzles me. It's the part:

1 ##### Divide data by two parts - model fitting & prediction
dat1 = mdat[1:(nrow(mdat)-1), ]
dat2 = mdat[nrow(mdat), ]

2 ##### Fit Model;
fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1);
summary(fit)

和:

3 #### Prediction for the next month;
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE);

我确实知道,(1)中的dat2只是mdat中的最后一行. (2)表示将回归模型应用于除数据集中最后一行以外的所有内容.

I do understand, that dat2 in (1) is only the last row from mdat. (2) means that the regression model is applied to everything but the last row in the dataset.

但是为什么在(3)的预测模型中使用newdata=dat2?它是什么意思?为什么只有最后一行?

But why is newdata=dat2 in the prediction model of (3) being used and what does it mean? Why the last row only?

推荐答案

以下是每行代码的说明:

Here is a description for each line of the code:

dat1 = mdat[1:(nrow(mdat)-1), ] 

创建整个数据集的子集,其中包含除最后一行以外的所有内容.

Creates a subset of the whole dataset which contains all but the last row.

dat2 = mdat[nrow(mdat), ]

创建整个数据集的子集,其中仅包含最后一行.

Creates a subset of the whole dataset which contains only the last row.

fit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1)

对于模型拟合,仅使用第一个子集dat1.因此数据没有最后一行.

For the model fitting is only the first subset dat1 used. So the data without the last row.

predict.fit = predict(fit, newdata=dat2, se.fit=TRUE)

predict采用拟合的模型,并查看其对于看不见"数据dat2的预测.

predict takes the fitted model and looks what it would predict for the "unseen" data dat2.

在最简单的情况下,只有一个自变量,我们将对dat1拟合一条线,然后查看对于dat2的X值可以预测哪个Y值.

In the easiest case with only one independent variable we would fit a line to dat1 and then look which Y-value would be predicted for the X-value of dat2.

这篇关于预报()和新数据-这是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆