在ggplot中绘制连续协变量的预测生存曲线 [英] Plotting predicted survival curves for continuous covariates in ggplot

查看:385
本文介绍了在ggplot中绘制连续协变量的预测生存曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在cox比例风险模型中绘制连续协变量代表值的生存曲线?具体来说,我想在ggplot中使用 survfit.cox survfit对象进行此操作。

How can I plot survival curves for representative values of a continuous covariate in a cox proportional hazards model? Specifically, I would like to do this in ggplot using a "survfit.cox" "survfit" object.

这个问题似乎已经得到了解答,但是我用 survfit和 newdata(以及许多其他搜索)搜索了SO中的所有内容条款)。到目前为止,这是最接近回答我的问题的线索: Plot Kaplan -Meier进行Cox回归

This may seem like a question that has already been answered, but I have searched through everything in SO with the terms 'survfit' and 'newdata' (plus many other search terms). This is the thread that comes closest to answering my question so far: Plot Kaplan-Meier for Cox regression

与该帖子的答案之一中提供的可重现示例保持一致:

In keeping with the reproducible example offered in one of the answers to that post:

url <- "http://socserv.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
df <- read.table(url, header = TRUE)

library(dplyr)
library(ggplot2)
library(survival)
library(magrittr)
library(broom)

# Identifying the 25th and 75th percentiles for prio (continuous covariate)

summary(df$prio)

# Cox proportional hazards model with other covariates
# 'prio' is our explanatory variable of interest

m1 <- coxph(Surv(week, arrest) ~ 
                       fin + age + race + prio,
                     data = df)

# Creating new df to get survival predictions
# Want separate curves for the the different 'fin' and 'race'
# groups as well as the 25th and 75th percentile of prio

newdf <- df %$%
  expand.grid(fin = levels(fin), 
                    age = 30, 
                    race = levels(race), 
                    prio = c(1,4))

# Obtain the fitted survival curve, then tidy 
# into a dataframe that can be used in ggplot

survcurv <- survfit(m1, newdata = newdf) %>%
  tidy()

问题是,一旦我有了这个称为 survcurv 的数据框,由于没有保留原始变量,因此我无法确定哪个估计变量属于哪个模式。例如,哪个估计变量代表30岁的拟合曲线,种族='其他',PRIO ='4',鳍='否'?

The problem is, that once I have this dataframe called survcurv, I cannot tell which of the 'estimate' variables belongs to which pattern because none of the original variables are retained. For example, which of the 'estimate' variables represents the fitted curve for 30 year old, race = 'other', prio = '4', fin = 'no'?

在我见过的所有其他示例中,通常都将survfit对象放入通用的 plot()函数中,并且不添加图例。我想使用ggplot并为每个预测曲线添加图例。

In all other examples i've seen, usually one puts the survfit object into a generic plot() function and does not add a legend. I want to use ggplot and add a legend for each of the predicted curves.

在我自己的数据集中,该模型更加复杂,而且曲线比我在此处显示的要多得多,因此您可以想象看到40种不同的估计。 1'..'estimate.40'变量使您很难理解是什么。

In my own dataset, the model is a lot more complex and there are a lot more curves than I show here, so as you can imagine seeing 40 different 'estimate.1'..'estimate.40' variables makes it hard to understand what is what.

推荐答案

尝试定义您的 survcurv 像这样:

survcurv <- 
  lapply(1:nrow(newdf),
         function(x, m1, newdata){
           cbind(newdata[x, ], survfit(m1, newdata[x, ]) %>% tidy)
         },
         m1, 
         newdf) %>%
  bind_rows()

这将包括所有预测变量值作为带有预测估计的列。

This will include all of the predictor values as columns with the predicted estimates.

这篇关于在ggplot中绘制连续协变量的预测生存曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆