coxph()X矩阵被认为是奇异的; [英] coxph() X matrix deemed to be singular;

查看:1019
本文介绍了coxph()X矩阵被认为是奇异的;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用coxph()时遇到了一些麻烦。我有两个类别变量: tecnologia和 pais,我想评估 pais对 tecnologia的可能的交互作用。 tecnologia是一个具有2个级别的变量因子:gps和convencional。而 pais分为2个级别:PT和ES。我不知道为什么这个警告不断出现。
这是代码和输出:

I'm having some trouble using coxph(). I've two categorical variables:"tecnologia" and "pais", and I want to evaluate the possible interaction effect of "pais" on "tecnologia"."tecnologia" is a variable factor with 2 levels: gps and convencional. And "pais" as 2 levels: PT and ES. I have no idea why this warning keeps appearing. Here's the code and the output:

cox_AC<-coxph(Surv(dados_temp$dias_seg,dados_temp$status)~tecnologia*pais,data=dados_temp)
Warning message:
In coxph(Surv(dados_temp$dias_seg, dados_temp$status) ~ tecnologia *  :
  X matrix deemed to be singular; variable 3

> cox_AC
Call:
coxph(formula = Surv(dados_temp$dias_seg, dados_temp$status) ~ 
    tecnologia * pais, data = dados_temp)


                       coef exp(coef) se(coef)     z     p
tecnologiagps        -0.152     0.859    0.400 -0.38 7e-01
paisPT                1.469     4.345    0.406  3.62 3e-04
tecnologiagps:paisPT     NA        NA    0.000    NA    NA

Likelihood ratio test=23.8  on 2 df, p=6.82e-06  n= 127, number of events= 64 

尽管这个几个月前我也提出了类似的问题,但我仍在就这个问题提出另一个问题,因为我再次面临着同样的问题,其他d ata。这次我确定这不是与数据相关的问题。

I'm opening another question about this subject, although I made a similar one some months ago, because I'm facing the same problem again, with other data. And this time I'm sure it's not a data related problem.

有人可以帮助我吗?
谢谢

Can somebody help me? Thank you

更新:
问题似乎不是一个完美的分类

UPDATE: The problem does not seem to be a perfect classification

> xtabs(~status+tecnologia,data=dados)  

      tecnologia
status conv doppler gps  
     0   39       6  24  
     1   30       3  34 

> xtabs(~status+pais,data=dados)  

      pais  
status ES PT  
     0 71  8  
     1 49 28  
 > xtabs(~tecnologia+pais,data=dados)

          pais  
tecnologia ES PT
   conv    69  0
   doppler  1  8
   gps     30 28


推荐答案

这是一个简单的示例,似乎在重现您的问题:

Here's a simple example which seems to reproduce your problem:

> library(survival)
> (df1 <- data.frame(t1=seq(1:6),
                    s1=rep(c(0, 1), 3),
                    te1=c(rep(0, 3), rep(1, 3)),
                    pa1=c(0,0,1,0,0,0)
                    ))
   t1 s1 te1 pa1
 1  1  0   0   0
 2  2  1   0   0
 3  3  0   0   1
 4  4  1   1   0
 5  5  0   1   0
 6  6  1   1   0

> (coxph(Surv(t1, s1) ~ te1*pa1, data=df1))
Call:
coxph(formula = Surv(t1, s1) ~ te1 * pa1, data = df1)


        coef exp(coef) se(coef)         z  p
te1      -23  9.84e-11    58208 -0.000396  1
pa1      -23  9.84e-11   100819 -0.000229  1
te1:pa1   NA        NA        0        NA NA

现在让我们寻找完美分类,例如因此:

Now lets look for 'perfect classification' like so:

> (xtabs( ~ s1+te1, data=df1))
   te1
s1  0 1
  0 2 1
  1 1 2
> (xtabs( ~ s1+pa1, data=df1))
   pa1
s1  0 1
  0 2 1
  1 3 0

请注意, pa1 1 c> 完全预测状态为 s1 等于 0 。也就是说,根据您的数据,如果您知道 pa1 == 1 ,则可以确定 s1 == 0 。因此,在这种设置下,不适合使用Cox的模型,这会导致数值误差。
可以通过

Note that a value of 1 for pa1 exactly predicts having a status s1 equal to 0. That is to say, based on your data, if you know that pa1==1 then you can be sure than s1==0. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors. This can be seen with

> coxph(Surv(t1, s1) ~ pa1, data=df1)

给予

Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1 ; beta may be infinite. 

重要的是在拟合之前先查看这些交叉表

It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.

如果我们将交互项添加到 df1 手动像这样:

If we add the interaction term to df1 manually like this:

> (df1 <- within(df1,
+               te1pa1 <- te1*pa1))
  t1 s1 te1 pa1 te1pa1
1  1  0   0   0      0
2  2  1   0   0      0
3  3  0   0   1      0
4  4  1   1   0      0
5  5  0   1   0      0
6  6  1   1   0      0

然后使用

> (xtabs( ~ s1+te1pa1, data=df1))
   te1pa1
s1  0
  0 3
  1 3

我们可以看到它是无用分类器,也就是说,它无助于预测状态 s1

We can see that it's a useless classifier, i.e. it does not help predict status s1.

当将所有三个条件组合时,装配工确实会产生一个 te1 pe1 的数值,即使 pe1 是如上所述的完美预测器。但是,对系数的值及其误差的观察表明它们是不可信的。

When combining all 3 terms, the fitter does manage to produce a numerical value for te1 and pe1 even though pe1 is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.

编辑 @JMarcelino:如果您查看警告,在示例中,第一个 coxph 模型发出的消息,您将看到警告消息:

Edit @JMarcelino: If you look at the warning message from the first coxph model in the example, you'll see the warning message:

2: In coxph(Surv(t1, s1) ~ te1 * pa1, data = df1) :
  X matrix deemed to be singular; variable 3

这可能与您得到的错误相同,并且归因于此分类问题。另外,您的第三个交叉表 xtabs(〜tecnologia + pais,data = dados)不如 status 互动条件。您可以先像上面的示例一样手动添加交互项,然后检查交叉表。或者您可以说:

Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table xtabs(~ tecnologia+pais, data=dados) is not as important as the table of status by interaction term. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:

> with(df1,
       table(s1, pa1te1=pa1*te1))
   pa1te1
s1  0
  0 3
  1 3

也就是说,我注意到您第三张表中的一个单元格为零( conv PT )意味着您无法通过这种预测变量组合进行观察。尝试安装时会造成问题。

That said, I notice one of the cells in your third table has a zero (conv, PT) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.

通常,对于所有级别的预测变量,结果应具有 some 值,并且预测变量不应将结果精确地分类为全部或不输入 50/50

In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.

编辑2 @ user75782131是,通常来说 xtabs 或类似的交叉表应在结果和预测变量是离散的(即编号有限)的模型中执行。水平。如果存在完美分类,则预测模型/回归可能不合适。例如,逻辑回归(结果为二进制)和Cox模型都是如此。

Edit 2 @user75782131 Yes, generally speaking xtabs or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.

这篇关于coxph()X矩阵被认为是奇异的;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆