在线性回归的data.table上使用Predict [英] Use Predict on data.table with Linear Regression
问题描述
还原为此发布,我创建了一个在 data.table 包中进行线性回归的示例,如下所示:
Regrad to this Post, I have created an example to play with linear regression on data.table package as follows:
## rm(list=ls()) # anti-social
library(data.table)
set.seed(1011)
DT = data.table(group=c("b","b","b","a","a","a"),
v1=rnorm(6),v2=rnorm(6), y=rnorm(6))
setkey(DT, group)
ans <- DT[,as.list(coef(lm(y~v1+v2))), by = group]
返回
group (Intercept) v1 v2
1: a 1.374942 -2.151953 -1.355995
2: b -2.292529 3.029726 -9.894993
我能够获得lm
函数的系数.
I am able to obtain the coefficients of the lm
function.
我的问题是:
我们如何直接使用predict
进行新观察?如果我们有以下新观察结果:
My question is:
How can we directly use predict
to new observations ? If we have the new observations as follows:
new <- data.table(group=c("b","b","b","a","a","a"),v1=rnorm(6),v2=rnorm(6))
我尝试过:
setkey(new, group)
DT[,predict(lm(y~v1+v2), new), by = group]
但是它返回了我奇怪的答案:
but it returns me strange answers:
group V1
1: a -2.525502
2: a 3.319445
3: a 4.340253
4: a 3.512047
5: a 2.928245
6: a 1.368679
7: b -1.835744
8: b -3.465325
9: b 19.984160
10: b -14.588933
11: b 11.280766
12: b -1.132324
谢谢
推荐答案
您每次都在预测整个new
数据集.如果您只想预测每个组的新数据,则需要按组对"newdata"进行子集化.
You are predicting onto the entire new
data set each time. If you want to predict only on the new data for each group you need to subset the "newdata" by group.
这是.BY
有用的实例.这有两种可能性
This is an instance where .BY
will be useful. Here are two possibilities
a <- DT[,predict(lm(y ~ v1 + v2), new[.BY]), by = group]
b <- new[,predict(lm(y ~ v1 + v2, data = DT[.BY]), newdata=.SD),by = group]
两者都给出相同的结果
identical(a,b)
# [1] TRUE
a
# group V1
#1: a -2.525502
#2: a 3.319445
#3: a 4.340253
#4: b -14.588933
#5: b 11.280766
#6: b -1.132324
这篇关于在线性回归的data.table上使用Predict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!