是什么导致此错误?由于奇异而未定义系数 [英] What is causing this error? Coefficients not defined because of singularities

查看:47
本文介绍了是什么导致此错误?由于奇异而未定义系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为我的数据找到一个模型,但收到消息系数:(由于奇异性而未定义3个)"这些发生在冬季,大流量和高流量

I'm trying to find a model for my data but I get the message "Coefficients: (3 not defined because of singularities)" These occur for winter, large and high_flow

我发现了这一点:https://stats.stackexchange.com/questions/13465/how-to-deal-with-an-error-such-as-coefficients-14-not-defined-because-of-singu

表示可能是错误的伪变量,但是我检查了我所有的列都不重复.

which said it may be incorrect dummy variables, but I've checked that none of my columns are duplicates.

当我使用函数alias()时得到:

when I use the function alias() I get:

Model :
S ~ A + B + C + D + E + F + G + spring + summer + autumn + winter + small + medium + large + low_flow + med_flow + high_flow

Complete :
          (Intercept) A  B  C  D  E  F  G  spring summer autumn small medium
winter     1           0  0  0  0  0  0  0 -1     -1     -1      0     0    
large      1           0  0  0  0  0  0  0  0      0      0     -1    -1    
high_flow  1           0  0  0  0  0  0  0  0      0      0      0     0    
          low_flow med_flow
winter     0        0      
large      0        0      
high_flow -1       -1      

我数据的

列A-H包含数值其余的列取0或1,我检查了没有冲突的值(例如,如果spring = 1,秋天= summer = winter = 0)

columns A-H of my data contain numeric values the remaining columns take 0 or 1, and I have checked there are no conflicting values (i.e. if spring = 1 for a case, autumn=summer=winter=0)

model_1 <- lm(S ~ A+B+C+D+E+F+G+spring+summer+autumn+winter+small+medium+large+low_flow+med_flow+high_flow, data = trainOne)
summary(model_1)

有人可以解释这个错误吗?

Can someone explain the error please?

我将数据更改为二进制数据之前的示例

example of my data before I changed it to binary

season  size   flow  A  B   C   D   E   F   G  S
spring small  medium 52 72 134  48 114 114 142 11
autumn small  medium 43 21  98 165 108  23  60 31
spring medium medium 41 45 161  86 177 145  32 12
autumn large  medium 40 86 132  80  82 138 186 16
winter medium  high  49 32 147 189 125  43 144 67
summer large   high  43  9 158  64  14 146  15 71

推荐答案

@JuliusVainora已经为您很好地解释了错误的发生方式,我将不再重复.但是,朱利叶斯(Julius)的答案只是一种方法,如果您不了解在Winter = 1,large = 1和high_flow = 1的情况下确实有价值的话,可能无法令人满意.在显示屏上可以很容易地看到它是(拦截)"的值.您可以通过在公式中添加 +0 来使结果更易于解释.(或者可能不会,具体取决于数据情况.)

@JuliusVainora has already given you a good explanation of how the error occurs, which I will not repeat. However, Julius' answer is only one method and might not be satisfying if you don't understand that there really is a value for cases where winter = 1, large=1 and high_flow=1. It can readily be seen in the display as the value for "(Intercept)". You might be able to make the result more interpretable by adding +0 to your formula. (Or it might not, depending on the data situation.)

但是,我认为您确实应该重新检查分类变量的编码方式.您正在使用从其他系统(例如SAS或SPSS)复制的每个级别一个虚拟变量的方法?可以预见的是,这将来会给您造成问题,同时也是一种痛苦的编码和维护方法.R 的 data.frame 函数已经自动创建了factor,在单个变量中编码多个级别.(阅读?factor .)因此您的公式将变为:

However, I think that you really should re-examine how your coding of categorical variables is done. You are using a method of one dummy variable per level that you are copying from some other system, perhaps SAS or SPSS? That's going to predictably cause problems for you in the future, as well as being a painful method to code and maintain. R's data.frame function already automagically creates factor's that encode multiple levels in a single variable. (Read ?factor.) So your formula would become:

 S ~ A + B + C + D + E + F + G + season + size + flow

这篇关于是什么导致此错误?由于奇异而未定义系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆