Coxph预测与系数不匹配 [英] Coxph predictions don't match the coefficients

查看:386
本文介绍了Coxph预测与系数不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下午好,

我可以发布可复制的代码,并且如果每个人都同意出了点问题,我当然会的,但是现在我想我的问题很简单,有人会为我指明正确的路径。

I could post reproducible code and certainly will if everyone agrees that something is wrong, but right now I think my question is quite simple and someone will point me the right path.

我正在处理以下数据集:

I am working in a data set like this:

created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    36     0
2                  true    36     0
3                  true     0     1
4                  true    28     0
5                  true     9     0
6                  true     0     1
7                  true    13     0
8                  true    19     0
9                  true     9     0
10                 true    16     0

我拟合了这样的Cox回归模型:

I fitted a Cox Regression model like this:

fit_train = coxph(Surv(time = t,event = c) ~ created_as_free_user ,data = teste)
summary(fit_train)

并收到:

Call:
coxph(formula = Surv(time = t, event = c) ~ created_as_free_user, 
    data = teste)

  n= 9000, number of events= 1233 

                            coef exp(coef) se(coef)      z Pr(>|z|)    
created_as_free_usertrue -0.7205    0.4865   0.1628 -4.426 9.59e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                         exp(coef) exp(-coef) lower .95 upper .95
created_as_free_usertrue    0.4865      2.055    0.3536    0.6693

Concordance= 0.511  (se = 0.002 )
Rsquare= 0.002   (max possible= 0.908 )
Likelihood ratio test= 15.81  on 1 df,   p=7e-05
Wald test            = 19.59  on 1 df,   p=9.589e-06
Score (logrank) test = 20.45  on 1 df,   p=6.109e-06

到目前为止很好。
下一步:根据新数据预测结果。
我理解predict.coxph可以给我(或者至少我认为我可以)的不同类型的预测。让我们使用type = lp:

So far so good. Next step: Predict the results on new data. I understand the different types of predictions that predict.coxph can give me (or at least I think I do). Let's use the type = "lp":

head(predict(fit_train,validacao,type = "lp"),n=20)

并获取:

     1           2           3           4           5           6           7           8           9          10 
-0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 
         11          12          13          14          15          16          17          18          19          20 
-0.01208854 -0.01208854  0.70842049 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 -0.01208854 

确定。但是,当我查看要估算的数据时:

OK. But when I look at the data that I am trying to estimate:

# A tibble: 9,000 × 3
   created_as_free_user     t     c
                 <fctr> <int> <int>
1                  true    20     0
2                  true    12     0
3                  true     0     1
4                  true    10     0
5                  true    51     0
6                  true    36     0
7                  true    44     0
8                  true     0     1
9                  true    27     0
10                 true     6     0
# ... with 8,990 more rows

这让我感到困惑...。

It makes me confuse....

type = lp是不是应该给您线性预测?
对于上述我要估计的数据,由于created_as_free_user变量等于true,我是否错误地期望type = lp预测准确地为-0.7205(上述模型的系数)? -0.01208854来自哪里?我怀疑这是某种规模的情况,但无法在线找到答案。

The type = "lp" isn't suppose to give you the linear predictions? For this data above that I am trying to estimate, since the created_as_free_user variable is equal to true, Am I wrong expecting the type = "lp" prediction to be exactaly -0.7205 (the coefficient of the model above)? Where does the -0.01208854 came? I suspect it's some sort of scale situation, but couldn't find the answer online.

我的最终目标是预测类型=给出的h(t)= 预期,但使用它并不是很舒服,因为它使用了我不完全了解的-0.01208854值。

My final objective is the h(t) that is given by the prediction type = "expected", but I am not all that comfortable using it because it uses this -0.01208854 value that I don't fully understand.

非常感谢

推荐答案

?predict.coxph 中的详细信息部分显示为:

The Details section in ?predict.coxph reads:


Cox模型是相对风险模型。
类型的线性预测变量,风险和项的预测都与它们来自的
样本有关。默认情况下,每个
的参考值都是阶层内的平均协变量。

The Cox model is a relative risk model; predictions of type "linear predictor", "risk", and "terms" are all relative to the sample from which they came. By default, the reference value for each of these is the mean covariate within strata.

为说明这是什么意思,我们可以看一个简单的例子。一些伪造数据:

To illustrate what this means, we can look at a simple example. Some fake data:

test1 <- list(time=c(4,3,1,1,1), 
             status=c(1,1,1,0,0), 
             x=c(0,2,1,1,0)) 

我们拟合模型并查看预测:

We fit a model and view predictions:

fit <- coxph(Surv(time, status) ~ x, test1) 
predict(fit, type = "lp")
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

预测与以下内容相同:

(test1$x - mean(test1$x)) * coef(fit)
# [1] -0.6976630  1.0464945  0.1744157  0.1744157 -0.6976630

(使用这种逻辑和一些算法,我们可以从您的结果中得出,对于您的 created_as_free_user,您有9000个观察结果中有8849个真实 变量。)

(Using this logic and some arithmetic we can back out from your results that you have 8849 "trues" out of 9000 observations for your created_as_free_user variable.)

这篇关于Coxph预测与系数不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆