使用lm构建回归模型时发生错误("contrasts<-(`* tmp *`...中的错误...对比度只能应用于2级或更多级的因子)" [英] Error when building regression model using lm ( Error in `contrasts<-`(`*tmp*`... contrasts can be applied only to factors with 2 or more levels)
问题描述
根据包含的变量以及在公式中指定变量的顺序,我会收到此错误:
I get this error depending on which variables I include and the sequence in which I specify them in the formula:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
我对此进行了一些研究,看来这可能是由于所讨论的变量不是因子变量引起的.在这种情况下(is_women_owned),它是一个具有2个级别(是",否")的因子变量.
I've done a little research on this and it looks like it would be caused by the variable in question not being a factor variable. In this case (is_women_owned), it is a factor variable with 2 levels ("Yes", "No").
> levels(customer_accounts$is_women_owned)
[1] "No" "Yes"
没有错误:
f1 <- lm(combined_sales ~ is_women_owned, data=customer_accounts)
没有错误:
f2 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth, data=customer_accounts)
根据上述公式加上因子变量"is_women_owned":
Regressing on the above formula plus the factor variable "is_women_owned":
f3 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth + is_women_owned, data=customer_accounts)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
如您所料,在应用逐步线性回归时会遇到相同的错误.
I get the same error when applying stepwise linear regression, as you would expect.
这似乎是一个错误,应该给我们提供一个模型,其中"is_women_owned"可能没有附加的解释性值,因为它与其他变量高度相关,而不是像这样出错.
This seems like a bug, it should give us a model where "is_women_owned" perhaps offers no additional explanatory value because it is highly correlated to the other variables, not error out like this.
我验证了此变量也没有丢失数据:
I verified that there is no missing data for this variable, too:
> which(is.na(customer_accounts$is_women_owned))
integer(0)
此外,因子变量中存在两个值:
Also, there are two values present in the factor variable:
customer_accounts$is_women_owned[1:20]
[1] No No No No No No No No No No No No No No Yes No
[17] No No No No
Levels: No Yes
推荐答案
twofac = data.frame("y" = c(1,2,3,4,5,1), "x" = c(2,56,3,5,2,1), "f" = c("apple","apple","apple","apple","apple","banana"))
onefac = twofac[1:5,]
lm(y~x+f,data=twofac)
lm(y~x+f,data=onefac)
> str(onefac)
'data.frame': 5 obs. of 3 variables:
$ y: num 1 2 3 4 5
$ x: num 2 56 3 5 2
$ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1
> str(twofac)
'data.frame': 6 obs. of 3 variables:
$ y: num 1 2 3 4 5 1
$ x: num 2 56 3 5 2 1
$ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1 2
> lm(y~x+f,data=twofac)
Call:
lm(formula = y ~ x + f, data = twofac)
Coefficients:
(Intercept) x fbanana
3.30783 -0.02263 -2.28519
> lm(y~x+f,data=onefac)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
如果运行上述操作,您会注意到twofac,这是一个同时包含两个因子的2级因子的模型,它将毫无问题地运行. onefac模型具有相同的2级因子,但仅存在1级,给出的误差与您相同.
If you run the above you will notice twofac, a model with a 2-level factor where both factors are present, will run with no problem. onefac, a model with the same 2-level factor but only one level is present, gives the same error you got.
如果您的因子只有一个水平,那么针对该因子进行回归分析不会提供任何其他信息,因为它在所有响应变量中都是恒定的
If your factor only has one of the levels then regressing against that factor gives no additional information as it is constant across all responsevariables
这篇关于使用lm构建回归模型时发生错误("contrasts<-(`* tmp *`...中的错误...对比度只能应用于2级或更多级的因子)"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!