使用lm构建回归模型时发生错误("contrasts<-(`* tmp *`...中的错误...对比度只能应用于2级或更多级的因子)" [英] Error when building regression model using lm ( Error in `contrasts<-`(`*tmp*`... contrasts can be applied only to factors with 2 or more levels)

查看:2982
本文介绍了使用lm构建回归模型时发生错误("contrasts<-(`* tmp *`...中的错误...对比度只能应用于2级或更多级的因子)"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据包含的变量以及在公式中指定变量的顺序,我会收到此错误:

I get this error depending on which variables I include and the sequence in which I specify them in the formula:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

我对此进行了一些研究,看来这可能是由于所讨论的变量不是因子变量引起的.在这种情况下(is_women_owned),它是一个具有2个级别(是",否")的因子变量.

I've done a little research on this and it looks like it would be caused by the variable in question not being a factor variable. In this case (is_women_owned), it is a factor variable with 2 levels ("Yes", "No").

> levels(customer_accounts$is_women_owned)
[1] "No"  "Yes"

没有错误:

f1 <- lm(combined_sales ~ is_women_owned, data=customer_accounts)

没有错误:

f2 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth, data=customer_accounts)

根据上述公式加上因子变量"is_women_owned":

Regressing on the above formula plus the factor variable "is_women_owned":

f3 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth + is_women_owned, data=customer_accounts)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

如您所料,在应用逐步线性回归时会遇到相同的错误.

I get the same error when applying stepwise linear regression, as you would expect.

这似乎是一个错误,应该给我们提供一个模型,其中"is_women_owned"可能没有附加的解释性值,因为它与其他变量高度相关,而不是像这样出错.

This seems like a bug, it should give us a model where "is_women_owned" perhaps offers no additional explanatory value because it is highly correlated to the other variables, not error out like this.

我验证了此变量也没有丢失数据:

I verified that there is no missing data for this variable, too:

> which(is.na(customer_accounts$is_women_owned))
integer(0)

此外,因子变量中存在两个值:

Also, there are two values present in the factor variable:

customer_accounts$is_women_owned[1:20]
 [1] No  No  No  No  No  No  No  No  No  No  No  No  No  No  Yes No 
[17] No  No  No  No 
Levels: No Yes

推荐答案

twofac = data.frame("y" = c(1,2,3,4,5,1), "x" = c(2,56,3,5,2,1), "f" = c("apple","apple","apple","apple","apple","banana"))
onefac = twofac[1:5,]

lm(y~x+f,data=twofac)
lm(y~x+f,data=onefac)

> str(onefac)
'data.frame':   5 obs. of  3 variables:
 $ y: num  1 2 3 4 5
 $ x: num  2 56 3 5 2
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1
> str(twofac)
'data.frame':   6 obs. of  3 variables:
 $ y: num  1 2 3 4 5 1
 $ x: num  2 56 3 5 2 1
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1 2
> lm(y~x+f,data=twofac)

Call:
lm(formula = y ~ x + f, data = twofac)

Coefficients:
(Intercept)            x      fbanana  
    3.30783     -0.02263     -2.28519  

> lm(y~x+f,data=onefac)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

如果运行上述操作,您会注意到twofac,这是一个同时包含两个因子的2级因子的模型,它将毫无问题地运行. onefac模型具有相同的2级因子,但仅存在1级,给出的误差与您相同.

If you run the above you will notice twofac, a model with a 2-level factor where both factors are present, will run with no problem. onefac, a model with the same 2-level factor but only one level is present, gives the same error you got.

如果您的因子只有一个水平,那么针对该因子进行回归分析不会提供任何其他信息,因为它在所有响应变量中都是恒定的

If your factor only has one of the levels then regressing against that factor gives no additional information as it is constant across all responsevariables

这篇关于使用lm构建回归模型时发生错误("contrasts&lt;-(`* tmp *`...中的错误...对比度只能应用于2级或更多级的因子)"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆