当“对比度只能应用于具有两个或多个级别的因子"时,如何执行GLM? [英] How to do a GLM when "contrasts can be applied only to factors with 2 or more levels"?

查看:168
本文介绍了当“对比度只能应用于具有两个或多个级别的因子"时,如何执行GLM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用glm在R中进行回归,但是有一种方法可以做到,因为我得到了对比度误差.

I want to do a regression in R using glm, but is there a way to do it since I get the contrasts error.

mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12),
                   WL=rep(c(1,0),12), 
                   New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), 
                   Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA))

mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf)
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels

推荐答案

使用此处定义的debug_contr_errordebug_contr_error2函数:如何调试对比度只能应用于两个或两个以上级别的因数"错误?我们可以轻松地找到问题所在:变量New.Runner中只剩下一个级别.

Using the debug_contr_error and debug_contr_error2 function defined here: How to debug "contrasts can be applied only to factors with 2 or more levels" error? we can easily locate the problem: only a single level is left in variable New.Runner.

info <- debug_contr_error2(WL ~ New.Runner + Last.Run, mydf)

info[c(2, 3)]
#$nlevels
#New.Runner 
#         1 
#
#$levels
#$levels$New.Runner
#[1] "N"

## the data frame that is actually used by `glm`
dat <- info$mf

不能将单个级别的因数应用于对比度,因为任何种类的对比都会使级别数减少1. 1 - 1 = 0将从模型矩阵中删除此变量.

A factor of single level can not be applied contrasts to, since any kind of contrasts would reduce the number of levels by 1. By 1 - 1 = 0 this variable would be dropped from the model matrix.

那么,我们可以简单地要求不对单个级别的因子应用任何对比吗?否.所有对比方法均禁止这样做:

Well then, can we simply require that no contrasts be applied to a single-level factor? No. All contrasts methods forbid this:

contr.helmert(n = 1, contrasts = FALSE)
#Error in contr.helmert(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.poly(n = 1, contrasts = FALSE)
#Error in contr.poly(n = 1, contrasts = FALSE) : 
#  contrasts not defined for 0 degrees of freedom

contr.sum(n = 1, contrasts = FALSE)
#Error in contr.sum(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.treatment(n = 1, contrasts = FALSE)
#Error in contr.treatment(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.SAS(n = 1, contrasts = FALSE)
#Error in contr.treatment(n, base = if (is.numeric(n) && length(n) == 1L) n else length(n),  : 
#  not enough degrees of freedom to define contrasts

实际上,如果您仔细考虑,您会得出结论,没有对比,具有单个水平的因子只是所有1(即截距)的虚拟变量.因此,您绝对可以执行以下操作:

Actually, if you think it carefully, you will conclude that without contrasts, a factor with a single level is just a dummy variable of all 1, i.e., the intercept. So, you can definitely do the following:

dat$New.Runner <- 1    ## set it to 1, as if no contrasts is applied

mod <- glm(formula = WL ~ New.Runner + Last.Run, family = binomial, data = dat)
#(Intercept)   New.Runner     Last.Run  
#     1.4582           NA      -0.2507

由于排名不足,您获得了New.RunnerNA系数.实际上,应用对比是避免排名不足的基本方法.只是当一个因素只有一个水平时,对比的应用就变成了一个悖论.

You get an NA coefficient for New.Runner due to rank-deficiency. In fact, applying contrasts is a fundamental way to avoid rank-deficiency. It is just that when a factor has only one level, application of contrasts becomes a paradox.

让我们来看看模型矩阵:

Let's also have a look at the model matrix:

model.matrix(mod)
#   (Intercept) New.Runner Last.Run
#1            1          1        1
#2            1          1        5
#3            1          1        2
#4            1          1        6
#5            1          1        5
#6            1          1        4
#8            1          1        3
#9            1          1        7
#10           1          1        2
#11           1          1        4
#12           1          1        9
#13           1          1        8
#15           1          1        3
#16           1          1        5
#17           1          1        1
#19           1          1        6
#20           1          1       10
#21           1          1        7
#22           1          1        9
#23           1          1        2

(intercept)New.Runner具有相同的列,并且只能估计其中之一.如果要估算New.Runner,请删除截距:

The (intercept) and New.Runner have identical columns and only one of them can be estimated. If you want to estimate New.Runner, drop the intercept:

glm(formula = WL ~ 0 + New.Runner + Last.Run, family = binomial, data = dat)
#New.Runner    Last.Run  
#    1.4582     -0.2507 

请确保您彻底消化了排名不足的问题.如果您有一个以上的单层因子,并且将它们全部替换为1,那么丢弃一个截距仍然会导致排名不足.

Make sure you digest the rank-deficiency issue thoroughly. If you have more than one single-level factors and you replace all of them by 1, dropping a single intercept still results in rank-deficiency.

dat$foo.factor <- 1
glm(formula = WL ~ 0 + New.Runner + foo.factor + Last.Run, family = binomial, data = dat)
#New.Runner  foo.factor    Last.Run  
#    1.4582          NA     -0.2507 

这篇关于当“对比度只能应用于具有两个或多个级别的因子"时,如何执行GLM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆