如何在R中模拟左截断的Weibull故障时间数据 [英] How do I simulate a left truncated Weibull failure time data in R

查看:220
本文介绍了如何在R中模拟左截断的Weibull故障时间数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想模拟来自Weibull分布的左截断的故障时间数据. 我的目标是通过拟合Weibull回归模型来模拟数据并检索系数(我用于模拟的x1,x2,x3,x4和x5).在这里,xt=runif(N, 30, 80)表示研究的开始,Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))变量表示失败的时间.但是每当我进行回归分析时,我都会收到此警告消息

I want to simulate left truncated failure time data from Weibull distribution. My objective is to simulate data and retrieve the coefficients(of x1,x2,x3,x4, and x5 which I used for the simulation) by fitting a Weibull regression model. Here the xt=runif(N, 30, 80) denotes the start of the study, Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp)) variable denotes the failure time. But whenever I do the regression I am getting this warning message

Warning message:
In Surv(xt, time_M, event_M) : Stop time must be > start time, NA created```

这是我的尝试:

N = 10^5
H <- within(data.frame(xt=runif(N, 30, 80), x1=rnorm(N, 2, 1), x2=rnorm(N, -2, 1)), {
  x3 <- rnorm(N, 0.5*x1 + 0.5*x2, 2)
  x4 <- rnorm(N, 0.3*x1 + 0.3*x2 + 0.3*x3, 2 )
  lp1 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp2 <- -2 + 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp3 <- 0.5*x1 + 0.2*x2 + 0.1*x3 + 0.2*x4
  lp4 <- 0
  P1 <- exp(lp1)/(exp(lp2)+ exp(lp3)+1+exp(lp1))
  P2 <- exp(lp2)/(exp(lp1)+ exp(lp3)+1+exp(lp2))
  P3 <- exp(lp3)/(exp(lp2)+ exp(lp1)+1+exp(lp3))
  P4 <- 1/(exp(lp2)+ exp(lp3)+exp(lp1)+1)
  mChoices <- t(apply(cbind(P1,P2,P3,P4), 1, rmultinom, n = 1, size = 1))
  x5 <- apply(mChoices, 1, function(x) which(x==1))
  lp <-   0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3)
  Tm <- qweibull(runif(N,pweibull(xt,shape = 7.5, scale = 82*exp(lp)),1), shape=7.5, scale=82*exp(lp))
  Cens <- 100
  time_M <- pmin(Tm,Cens)
  event_M <- time_M == Tm })   
res.full_M <- weibreg(Surv(H$xt,H$time_M, H$event_M) ~ x1 + x2 + x3 + x4 + factor(x5), data = H)

所以任何人都可以帮助我修改此代码,以便使我的开始年龄(xt)小于相应的故障时间(time_M),并且拟合的回归模型的系数值接近于以下方程式 (lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3))

So can anyone help me to modify this code so that I can get the starting age (xt) less than the corresponding failure time (time_M) and the fitted regression model have coefficients values close to that in the following equation (lp <- 0.05*x1 + 0.2*x2 + 0.1*x3 + 0.02*x4 + log(1.5)*(x5==1) + log(5)*(x5==2) + log(2)*(x5==3))

推荐答案

您的第一个评论暗示您希望(可能经过审查)从30岁到诊断的时间.您有两个选择:使用生存时间"或患者30岁生日的日期及其诊断日期.使用前者会更容易,因为指定检查率会更容易.

Your first comment implies that you want (possibly censored) times from age 30 to diagnosis. You have two options: work with "survival times" or with the date of of the patients 30th birthday and their date of diagnosis. It's easier to use the former, as it's easier to specify your censoring rate.

  1. 从您选择的分布中生成未经审查的生存时间(T).
  2. 从Uniform(0,1)分布中绘制一个随机数.如果此数字小于您的审查率,则对观察结果进行审查:转到3.否则,您未经审查的观察到的生存时间为(T).
  3. 从Uniform(0,1)分布中绘制另一个随机变量(X).设置T = T * X.这是您审查的生存时间.
  1. Generate an uncensored survival time (T) from the distribution of your choice.
  2. Draw a random number from a Uniform(0, 1) distribution. If this number is less than your censoring rate, the observation is censored: go to 3. Otherwise, your uncensored observed survival time is (T).
  3. Draw another random variable (X) from a Uniform(0, 1) distribution. Set T = T*X. This is your censored survival time.

此过程将为您提供任何生存时间分布的数据,并按您选择的速率进行审查.

This procedure will give you data from any distribution of survival times, censored at the rate of your choice.

但是,我对您的规范的阅读告诉我,每个参与者都将在某些时候被诊断出感兴趣的状况.没有竞争风险.这合理吗?

However, my reading of your specification tells me that every participant will at some point be diagnosed with the condition of interest. There are no competing risks. Is this reasonable?

您的第二条评论令人困惑.您是事件发生的时间(a)从30岁到诊断的时间"(这将意味着右检查)或(b)从疾病发作到诊断的时间"(这将意味着左检查,也可能涉及右检查).如果(a),我的解决方案仍然成立.如果是(b),则需要提供更多信息:

Your second comment is confusing. Is your time to event (a) "time from age 30 to diagnosis" (which would imply right censoring) or (b) "time from onset of disease until diagnosis" (which would imply left censoring and could also involve right censoring). If (a), my solution still holds. If (b), you need to supply more information:

  • 从30岁到疾病发作的时间过程(分布)是什么?
  • 何时/多久进行一次诊断程序?
  • 诊断程序给出以下每个结果的机会是什么:误报,假阴性,真阳性,真阴性

仍然可以生成所需的数据,但这并不像(a)中那样简单.

It's still possible to generate the data you want, but it's not as easy as in (a).

这篇关于如何在R中模拟左截断的Weibull故障时间数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆