coxph()X矩阵被认为是奇异的; [英] coxph() X matrix deemed to be singular;
问题描述
我在使用coxph()时遇到了一些麻烦。我有两个类别变量: tecnologia和 pais,我想评估 pais对 tecnologia的可能的交互作用。 tecnologia是一个具有2个级别的变量因子:gps和convencional。而 pais分为2个级别:PT和ES。我不知道为什么这个警告不断出现。
这是代码和输出:
I'm having some trouble using coxph(). I've two categorical variables:"tecnologia" and "pais", and I want to evaluate the possible interaction effect of "pais" on "tecnologia"."tecnologia" is a variable factor with 2 levels: gps and convencional. And "pais" as 2 levels: PT and ES. I have no idea why this warning keeps appearing. Here's the code and the output:
cox_AC<-coxph(Surv(dados_temp$dias_seg,dados_temp$status)~tecnologia*pais,data=dados_temp)
Warning message:
In coxph(Surv(dados_temp$dias_seg, dados_temp$status) ~ tecnologia * :
X matrix deemed to be singular; variable 3
> cox_AC
Call:
coxph(formula = Surv(dados_temp$dias_seg, dados_temp$status) ~
tecnologia * pais, data = dados_temp)
coef exp(coef) se(coef) z p
tecnologiagps -0.152 0.859 0.400 -0.38 7e-01
paisPT 1.469 4.345 0.406 3.62 3e-04
tecnologiagps:paisPT NA NA 0.000 NA NA
Likelihood ratio test=23.8 on 2 df, p=6.82e-06 n= 127, number of events= 64
尽管这个几个月前我也提出了类似的问题,但我仍在就这个问题提出另一个问题,因为我再次面临着同样的问题,其他d ata。这次我确定这不是与数据相关的问题。
I'm opening another question about this subject, although I made a similar one some months ago, because I'm facing the same problem again, with other data. And this time I'm sure it's not a data related problem.
有人可以帮助我吗?
谢谢
Can somebody help me? Thank you
更新:
问题似乎不是一个完美的分类
UPDATE: The problem does not seem to be a perfect classification
> xtabs(~status+tecnologia,data=dados)
tecnologia
status conv doppler gps
0 39 6 24
1 30 3 34
> xtabs(~status+pais,data=dados)
pais
status ES PT
0 71 8
1 49 28
> xtabs(~tecnologia+pais,data=dados)
pais
tecnologia ES PT
conv 69 0
doppler 1 8
gps 30 28
推荐答案
这是一个简单的示例,似乎在重现您的问题:
Here's a simple example which seems to reproduce your problem:
> library(survival)
> (df1 <- data.frame(t1=seq(1:6),
s1=rep(c(0, 1), 3),
te1=c(rep(0, 3), rep(1, 3)),
pa1=c(0,0,1,0,0,0)
))
t1 s1 te1 pa1
1 1 0 0 0
2 2 1 0 0
3 3 0 0 1
4 4 1 1 0
5 5 0 1 0
6 6 1 1 0
> (coxph(Surv(t1, s1) ~ te1*pa1, data=df1))
Call:
coxph(formula = Surv(t1, s1) ~ te1 * pa1, data = df1)
coef exp(coef) se(coef) z p
te1 -23 9.84e-11 58208 -0.000396 1
pa1 -23 9.84e-11 100819 -0.000229 1
te1:pa1 NA NA 0 NA NA
现在让我们寻找完美分类,例如因此:
Now lets look for 'perfect classification' like so:
> (xtabs( ~ s1+te1, data=df1))
te1
s1 0 1
0 2 1
1 1 2
> (xtabs( ~ s1+pa1, data=df1))
pa1
s1 0 1
0 2 1
1 3 0
请注意, pa1 $ c $的值
1
c> 完全预测状态为 s1
等于 0
。也就是说,根据您的数据,如果您知道 pa1 == 1
,则可以确定 s1 == 0
。因此,在这种设置下,不适合使用Cox的模型,这会导致数值误差。
可以通过
Note that a value of 1
for pa1
exactly predicts having a status s1
equal to 0
. That is to say, based on your data, if you know that pa1==1
then you can be sure than s1==0
. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors.
This can be seen with
> coxph(Surv(t1, s1) ~ pa1, data=df1)
给予
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1 ; beta may be infinite.
重要的是在拟合之前先查看这些交叉表
It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.
如果我们将交互项添加到 df1
手动像这样:
If we add the interaction term to df1
manually like this:
> (df1 <- within(df1,
+ te1pa1 <- te1*pa1))
t1 s1 te1 pa1 te1pa1
1 1 0 0 0 0
2 2 1 0 0 0
3 3 0 0 1 0
4 4 1 1 0 0
5 5 0 1 0 0
6 6 1 1 0 0
然后使用
> (xtabs( ~ s1+te1pa1, data=df1))
te1pa1
s1 0
0 3
1 3
我们可以看到它是无用分类器,也就是说,它无助于预测状态 s1
。
We can see that it's a useless classifier, i.e. it does not help predict status s1
.
当将所有三个条件组合时,装配工确实会产生一个 te1
和 pe1
的数值,即使 pe1
是如上所述的完美预测器。但是,对系数的值及其误差的观察表明它们是不可信的。
When combining all 3 terms, the fitter does manage to produce a numerical value for te1
and pe1
even though pe1
is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.
编辑 @JMarcelino:如果您查看警告,在示例中,第一个 coxph
模型发出的消息,您将看到警告消息:
Edit @JMarcelino: If you look at the warning message from the first coxph
model in the example, you'll see the warning message:
2: In coxph(Surv(t1, s1) ~ te1 * pa1, data = df1) :
X matrix deemed to be singular; variable 3
这可能与您得到的错误相同,并且归因于此分类问题。另外,您的第三个交叉表 xtabs(〜tecnologia + pais,data = dados)
不如 status $ c $的表重要c>按
互动条件
。您可以先像上面的示例一样手动添加交互项,然后检查交叉表。或者您可以说:
Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table xtabs(~ tecnologia+pais, data=dados)
is not as important as the table of status
by interaction term
. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:
> with(df1,
table(s1, pa1te1=pa1*te1))
pa1te1
s1 0
0 3
1 3
也就是说,我注意到您第三张表中的一个单元格为零( conv
, PT
)意味着您无法通过这种预测变量组合进行观察。尝试安装时会造成问题。
That said, I notice one of the cells in your third table has a zero (conv
, PT
) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.
通常,对于所有级别的预测变量,结果应具有 some 值,并且预测变量不应将结果精确地分类为全部或不输入或 50/50 。
In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.
编辑2 @ user75782131是,通常来说 xtabs
或类似的交叉表应在结果和预测变量是离散的(即编号有限)的模型中执行。水平。如果存在完美分类,则预测模型/回归可能不合适。例如,逻辑回归(结果为二进制)和Cox模型都是如此。
Edit 2 @user75782131 Yes, generally speaking xtabs
or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.
这篇关于coxph()X矩阵被认为是奇异的;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!