在 R 中定义线性模型时的对比错误 [英] Error in contrasts when defining a linear model in R

查看:48
本文介绍了在 R 中定义线性模型时的对比错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试在 R 中定义我的线性模型时:

When I try to define my linear model in R as follows:

lm1 <- lm(predictorvariable ~ x1+x2+x3, data=dataframe.df)

我收到以下错误消息:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
contrasts can be applied only to factors with 2 or more levels 

有什么办法可以忽略或修复它?有些变量是因素,有些则不是.

Is there any way to ignore this or fix it? Some of the variables are factors and some are not.

推荐答案

如果您的自变量(RHS 变量)是一个因子或一个只取一个值的字符,那么就会发生这种类型的错误.

If your independent variable (RHS variable) is a factor or a character taking only one value then that type of error occurs.

示例:R 中的虹膜数据

Example: iris data in R

(model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris))

# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris)

# Coefficients:
#       (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica  
#            2.2514             0.8036             1.4587             1.9468  

现在,如果您的数据仅包含一个物种:

Now, if your data consists of only one species:

(model1 <- lm(Sepal.Length ~ Sepal.Width + Species,
              data=iris[iris$Species == "setosa", ]))
# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
#   contrasts can be applied only to factors with 2 or more levels

如果变量是数字(Sepal.Width)但只取一个值,比如 3,那么模型会运行,但你会得到 NA 作为该变量的系数如下:

If the variable is numeric (Sepal.Width) but taking only a single value say 3, then the model runs but you will get NA as coefficient of that variable as follows:

(model2 <-lm(Sepal.Length ~ Sepal.Width + Species,
             data=iris[iris$Sepal.Width == 3, ]))

# Call:
# lm(formula = Sepal.Length ~ Sepal.Width + Species, 
#    data = iris[iris$Sepal.Width == 3, ])

# Coefficients:
#       (Intercept)        Sepal.Width  Speciesversicolor   Speciesvirginica  
#             4.700                 NA              1.250              2.017

解决方案:只有一个值的因变量没有足够的变化.因此,您需要删除该变量,无论它是数字变量、字符变量还是因子变量.

Solution: There is not enough variation in dependent variable with only one value. So, you need to drop that variable, irrespective of whether that is numeric or character or factor variable.

根据评论更新:由于您知道错误只会发生在因子/字符上,因此您可以只关注那些并查看这些因子变量的级别长度是否为 1 (DROP) 或大于 1 (NODROP).

Updated as per comments: Since you know that the error will only occur with factor/character, you can focus only on those and see whether the length of levels of those factor variables is 1 (DROP) or greater than 1 (NODROP).

要查看变量是否为因子,请使用以下代码:

To see, whether the variable is a factor or not, use the following code:

(l <- sapply(iris, function(x) is.factor(x)))
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
#        FALSE        FALSE        FALSE        FALSE         TRUE 

那么就只能得到因子变量的数据框了

Then you can get the data frame of factor variables only

m <- iris[, l]

现在,找到因子变量的水平数,如果这是一个你需要删除那个

Now, find the number of levels of factor variables, if this is one you need to drop that

ifelse(n <- sapply(m, function(x) length(levels(x))) == 1, "DROP", "NODROP")

注意:如果因子变量的水平只有一个,那么就是变量,你必须删除.

Note: If the levels of factor variable is only one then that is the variable, you have to drop.

这篇关于在 R 中定义线性模型时的对比错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆