Predict.glm(,type ="terms")实际上是做什么的? [英] What does predict.glm(, type="terms") actually do?
问题描述
我对R中 predict.glm 函数的工作方式感到困惑. 根据帮助,
I am confused with the way predict.glm function in R works. According to the help,
项"选项返回一个矩阵,该矩阵给出线性预测变量上模型公式中每个项的拟合值.
The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
因此,如果我的模型的形式为f(y)= X * beta,则命令
Thus, if my model has form f(y) = X*beta, then command
predict(model, X, type='terms')
期望
产生相同的矩阵X,乘以beta元素.例如,如果我训练以下模型
is expected to produce the same matrix X, multiplied by beta element-wise. For example, if I train the following model
test.data = data.frame(y = c(0,0,0,1,1,1,1,1,1), x=c(1,2,3,1,2,2,3,3,3))
model = glm(y~(x==1)+(x==2), family = 'binomial', data = test.data)
所得系数为
beta <- model$coef
设计矩阵为
X <- model.matrix(y~(x==1)+(x==2), data = test.data)
(Intercept) x == 1TRUE x == 2TRUE
1 1 1 0
2 1 0 1
3 1 0 0
4 1 1 0
5 1 0 1
6 1 0 1
7 1 0 0
8 1 0 0
9 1 0 0
然后乘以系数,应该看起来像
Then multiplied by coefficients it should look like
pred1 <- t(beta * t(X))
(Intercept) x == 1TRUE x == 2TRUE
1 1.098612 -1.098612 0.0000000
2 1.098612 0.000000 -0.4054651
3 1.098612 0.000000 0.0000000
4 1.098612 -1.098612 0.0000000
5 1.098612 0.000000 -0.4054651
6 1.098612 0.000000 -0.4054651
7 1.098612 0.000000 0.0000000
8 1.098612 0.000000 0.0000000
9 1.098612 0.000000 0.0000000
但是,predict.glm
生成的实际矩阵似乎与此无关.以下代码
However, actual matrix produced by predict.glm
seems to be unrelated to this. The following code
pred2 <- predict(model, test.data, type = 'terms')
x == 1 x == 2
1 -0.8544762 0.1351550
2 0.2441361 -0.2703101
3 0.2441361 0.1351550
4 -0.8544762 0.1351550
5 0.2441361 -0.2703101
6 0.2441361 -0.2703101
7 0.2441361 0.1351550
8 0.2441361 0.1351550
9 0.2441361 0.1351550
attr(,"constant")
[1] 0.7193212
一个人如何解释这样的结果?
推荐答案
我已经编辑了您的问题,以包括获取(原始)模型矩阵,模型系数和预期的词项预测的正确"方式.因此,关于如何获得这些的其他问题已经解决.在下文中,我将帮助您理解predict.glm()
.
I have already edited your question, to include "correct" way of getting (raw) model matrix, model coefficients, and your intended term-wise prediction. So your other question on how to get these are already solved. In the following, I shall help you understand predict.glm()
.
predict.glm()
(实际上是predict.lm()
)在进行词项预测时已对每个模型项应用了居中约束.
predict.glm()
(actually, predict.lm()
) has applied centring constraints for each model term when doing term-wise prediction.
最初,您有一个模型矩阵
Initially, you have a model matrix
X <- model.matrix(y~(x==1)+(x==2), data = test.data)
但居中,通过删除列表示:
but it is centred, by dropping column means:
avx <- colMeans(X)
X1 <- sweep(X, 2L, avx)
> avx
(Intercept) x == 1TRUE x == 2TRUE
1.0000000 0.2222222 0.3333333
> X1
(Intercept) x == 1TRUE x == 2TRUE
1 0 0.7777778 -0.3333333
2 0 -0.2222222 0.6666667
3 0 -0.2222222 -0.3333333
4 0 0.7777778 -0.3333333
5 0 -0.2222222 0.6666667
6 0 -0.2222222 0.6666667
7 0 -0.2222222 -0.3333333
8 0 -0.2222222 -0.3333333
9 0 -0.2222222 -0.3333333
然后使用此居中模型矩阵完成按项计算:
Then term-wise computation is done using this centred model matrix:
t(beta*t(X1))
(Intercept) x == 1TRUE x == 2TRUE
1 0 -0.8544762 0.1351550
2 0 0.2441361 -0.2703101
3 0 0.2441361 0.1351550
4 0 -0.8544762 0.1351550
5 0 0.2441361 -0.2703101
6 0 0.2441361 -0.2703101
7 0 0.2441361 0.1351550
8 0 0.2441361 0.1351550
9 0 0.2441361 0.1351550
定心后,不同的项在垂直方向上均值为零.结果,截距将变为0.不用担心,通过汇总所有模型项的移位来计算新的截距:
After centring, different terms are vertically shifted to have zero mean. As a result, intercept will be come 0. No worry, a new intercept is computed, by aggregating shifts of all model terms:
intercept <- as.numeric(crossprod(avx, beta))
# [1] 0.7193212
现在您应该已经看到了predict.glm(, type = "terms")
给您的东西.
Now you should have seen what predict.glm(, type = "terms")
gives you.
这篇关于Predict.glm(,type ="terms")实际上是做什么的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!