考克斯比例风险模型 [英] Cox proportional hazard model
问题描述
我正在尝试对4组数据运行Cox比例风险模型.这是数据:
I am trying to run Cox proportional hazard model on a data of 4 groups. Here's the data:
我正在使用以下代码:
time_Allo_NHL<- c(28,32,49,84,357,933,1078,1183,1560,2114,2144)
censor_Allo_NHL<- c(rep(1,5), rep(0,6))
time_Auto_NHL<- c(42,53,57,63,81,140,176,210,252,476,524,1037)
censor_Auto_NHL<- c(rep(1,7), rep(0,1), rep(1,1), rep(0,1), rep(1,1), rep(0,1))
time_Allo_HOD<- c(2,4,72,77,79)
censor_Allo_HOD<- c(rep(1,5))
time_Auto_HOD<- c(30,36,41,52,62,108,132,180,307,406,446,484,748,1290,1345)
censor_Auto_HOD<- c(rep(1,7), rep(0,8))
myData <- data.frame(time=c(time_Allo_NHL, time_Auto_NHL, time_Allo_HOD, time_Auto_HOD),
censor=c(censor_Allo_NHL, censor_Auto_NHL, censor_Allo_HOD, censor_Auto_HOD),
group= rep(1:4,), each= )
str(myData)
问题在于每组都有不同数量的观察结果.我应该在代码中修改的内容:
The problem is each group has different number of observations. What I should modify in the code :
myData <- data.frame(time=c(time_Allo_NHL, time_Auto_NHL, time_Allo_HOD, time_Auto_HOD),
censor=c(censor_Allo_NHL, censor_Auto_NHL, censor_Allo_HOD,
censor_Auto_HOD), group= rep(1:4,), each= )
代替编写 each =#
,这样我就可以正确运行代码以完成Cox比例风险模型了吗?
Instead of writing each=#
so I can run the code properly in order to complete doing the Cox proportional hazard model?
然后我尝试使用以下代码运行Cox比例风险模型:
Then I have attempted to run a Cox proportional hazard model using the following code:
library(survival)
for(i in 1:43){
if (myData$group[i]==2)
myData$Z1[i]<-1
else myData$Z1[i]<-0
}
for(i in 1:43){
if (myData$group[i]==3)
myData$Z2[i]<-1
else myData$Z2[i]<-0
}
for(i in 1:43){
if (myData$group[i]==4)
myData$Z3[i]<-1
else myData$Z3[i]<-0
}
myData
Coxfit<-coxph(Surv(time,censor)~Z1+Z2+Z3, data = myData)
summary(Coxfit)
这就是我所拥有的.没有价值!!
This is all I got. There's no valuse!!
接下来,我想使用主要效果和相互作用项来测试移植类型和疾病类型之间的相互作用.
Next, I want to test for an interaction between type of transplant and disease type using main effects and interaction terms.
我将要使用的代码:
n<-length(myData$time)
n
for (i in 1:n){
if (myData$(here?)[i]==2)
myData$W1[i] <-1
else myData$W1[i]<-0
}
for (i in 1:n){
if (myData$(here?)[i]==2)
myData$W2[i] <-1
else myData$W2[i]<-0
}
myData
Coxfit.W<-coxph(Surv(time,censor)~W1+W2+W1*W2, data = myData)
summary(Coxfit.W)
我不确定上面的代码应该在此处(myData $(here?)
)中写什么.
I'm not sure what it should be written in here (myData$(here?)
from the above code.
推荐答案
这看起来像是俄亥俄州立大学的骨髓移植研究.
This looks like the bone marrow transplant study at Ohio State University.
正如您所提到的,每组每个组的观察值数量不同.我会考虑最后将每个子组中的行绑定在一起.
As you mentioned, each group has different numbers of observations per group. I would consider binding the rows from each subgroup together in the end.
首先,将为每个组创建一个数据框.我将添加一列以指示它们属于哪个组.因此,例如,在 df_Allo_NHL
中,所有观测值对于 group
都具有 Allo NHL
:
First, would create a data frame for each group. I would add a column indicating which group they belonged to. So, for example, in df_Allo_NHL
would have all of the observations have Allo NHL
for group
:
df_Allo_NHL <- data.frame(group = "Allo NHL",
time = c(28,32,49,84,357,933,1078,1183,1560,2114,2144),
censor = c(rep(1,5), rep(0,6)))
或者仅添加到您已经拥有的两个向量中:
Or just adding to the 2 vectors you have already:
df_Allo_NHL <- data.frame(group = "Allo NHL", time = time_Allo_NHL, censor = censor_Allo_NHL)
然后,一旦您拥有4个数据框,就可以将它们组合起来.一种方法是使用 Reduce
并将所有数据帧放在列表中.长格式的最终结果应该已经准备好用于Cox比例风险分析,并且您将可以包含 group
.(从模型表中添加了Z1和Z2.)
Then once you have your 4 data frames, you can combine them. One way to do this is by using Reduce
and putting all your data frames in a list. The final result should be ready for cox proportional hazards analysis, in long form, and you will have group
available to include. ( Z1 and Z2 added from table for model.)
time_Allo_NHL<- c(28,32,49,84,357,933,1078,1183,1560,2114,2144)
censor_Allo_NHL<- c(rep(1,5), rep(0,6))
df_Allo_NHL <- data.frame(group = "Allo NHL",
time = time_Allo_NHL,
censor = censor_Allo_NHL,
Z1 = c(90,30,40,60,70,90,100,90,80,80,90),
Z2 = c(24,7,8,10,42,9,16,16,20,27,5))
time_Auto_NHL<- c(42,53,57,63,81,140,176,210,252,476,524,1037)
censor_Auto_NHL<- c(rep(1,7), rep(0,1), rep(1,1), rep(0,1), rep(1,1), rep(0,1))
df_Auto_NHL <- data.frame(group = "Auto NHL",
time = time_Auto_NHL,
censor = censor_Auto_NHL,
Z1 = c(80,90,30,60,50,100,80,90,90,90,90,90),
Z2 = c(19,17,9,13,12,11,38,16,21,24,39,84))
time_Allo_HOD<- c(2,4,72,77,79)
censor_Allo_HOD<- c(rep(1,5))
df_Allo_HOD <- data.frame(group = "Allo HOD",
time = time_Allo_HOD,
censor = censor_Allo_HOD,
Z1 = c(20,50,80,60,70),
Z2 = c(34,28,59,102,71))
time_Auto_HOD<- c(30,36,41,52,62,108,132,180,307,406,446,484,748,1290,1345)
censor_Auto_HOD<- c(rep(1,7), rep(0,8))
df_Auto_HOD <- data.frame(group = "Auto HOD",
time = time_Auto_HOD,
censor = censor_Auto_HOD,
Z1 = c(90,80,70,60,90,70,60,100,100,100,100,90,90,90,80),
Z2 = c(73,61,34,18,40,65,17,61,24,48,52,84,171,20,98))
myData <- Reduce(rbind, list(df_Allo_NHL, df_Auto_NHL, df_Allo_HOD, df_Auto_HOD))
修改
如果继续进行并添加 Z1
(Karnofsky评分)和 Z2
(从诊断到移植的等待时间),则可以执行以下CPH生存模型. group
已经是一个因素,默认情况下,第一级 Allo NHL
将是参考类别.
If you go ahead and also add Z1
(Karnofsky Score) and Z2
(waiting time from diagnosis to transplant), you can do the CPH survival model like this below. group
is already a factor and the first level Allo NHL
would by default be there reference category.
library(survival)
Coxfit<-coxph(Surv(time,censor)~group+Z1+Z2, data = myData)
summary(Coxfit)
输出
Call:
coxph(formula = Surv(time, censor) ~ group + Z1 + Z2, data = myData)
n= 43, number of events= 26
coef exp(coef) se(coef) z Pr(>|z|)
groupAuto NHL 0.77357 2.16748 0.58631 1.319 0.18704
groupAllo HOD 2.73673 15.43639 0.94081 2.909 0.00363 **
groupAuto HOD 1.06293 2.89485 0.63494 1.674 0.09412 .
Z1 -0.05052 0.95074 0.01222 -4.135 3.55e-05 ***
Z2 -0.01660 0.98354 0.01002 -1.656 0.09769 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
groupAuto NHL 2.1675 0.46136 0.6869 6.8395
groupAllo HOD 15.4364 0.06478 2.4419 97.5818
groupAuto HOD 2.8948 0.34544 0.8340 10.0481
Z1 0.9507 1.05181 0.9282 0.9738
Z2 0.9835 1.01674 0.9644 1.0030
Concordance= 0.783 (se = 0.059 )
Likelihood ratio test= 32.48 on 5 df, p=5e-06
Wald test = 28.48 on 5 df, p=3e-05
Score (logrank) test = 39.45 on 5 df, p=2e-07
数据
group time censor Z1 Z2
1 Allo NHL 28 1 90 24
2 Allo NHL 32 1 30 7
3 Allo NHL 49 1 40 8
4 Allo NHL 84 1 60 10
5 Allo NHL 357 1 70 42
6 Allo NHL 933 0 90 9
7 Allo NHL 1078 0 100 16
8 Allo NHL 1183 0 90 16
9 Allo NHL 1560 0 80 20
10 Allo NHL 2114 0 80 27
11 Allo NHL 2144 0 90 5
12 Auto NHL 42 1 80 19
13 Auto NHL 53 1 90 17
14 Auto NHL 57 1 30 9
15 Auto NHL 63 1 60 13
16 Auto NHL 81 1 50 12
17 Auto NHL 140 1 100 11
18 Auto NHL 176 1 80 38
19 Auto NHL 210 0 90 16
20 Auto NHL 252 1 90 21
21 Auto NHL 476 0 90 24
22 Auto NHL 524 1 90 39
23 Auto NHL 1037 0 90 84
24 Allo HOD 2 1 20 34
25 Allo HOD 4 1 50 28
26 Allo HOD 72 1 80 59
27 Allo HOD 77 1 60 102
28 Allo HOD 79 1 70 71
29 Auto HOD 30 1 90 73
30 Auto HOD 36 1 80 61
31 Auto HOD 41 1 70 34
32 Auto HOD 52 1 60 18
33 Auto HOD 62 1 90 40
34 Auto HOD 108 1 70 65
35 Auto HOD 132 1 60 17
36 Auto HOD 180 0 100 61
37 Auto HOD 307 0 100 24
38 Auto HOD 406 0 100 48
39 Auto HOD 446 0 100 52
40 Auto HOD 484 0 90 84
41 Auto HOD 748 0 90 171
42 Auto HOD 1290 0 90 20
43 Auto HOD 1345 0 80 98
这篇关于考克斯比例风险模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!