带有分类变量总计的 R gsummary Row [英] R gtsummary Row with Categorical Variable Totals

查看:26
本文介绍了带有分类变量总计的 R gsummary Row的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大约 700,000 名患者的数据集,其中有医院站点 ID(因子变量).我想创建一行,其中医院的数量是可见的(这与患者的数量是分开的).除了整个列之外,我还有 3 个分类变量作为我的列.

目前,每个医院 ID 都有一个单独的行,其中包含每个站点中每个类别的患者数量.

我的代码如下:

t1 <- PIR %>%选择(siteidn, countryname) %>%tbl_summary(by = countryname,missing = "no",标签 = 列表(siteidn = "ICU 数量"),统计=列表(all_continuous() ~ "{mean} ({sd})",all_categorical() ~ "{n} ({p}%)")) %>%粗体标签()%>%italicize_levels() %>%add_overall()t2<-PIR%>%选择(站点 ID,主机)%>%tbl_summary(by = hostt ,missing = "no",标签 = 列表(siteidn = "ICU 数量"),统计=列表(all_continuous() ~ "{mean} ({sd})",all_categorical() ~ "{n} ({p}%)")) %>%粗体标签()%>%italicize_levels()t3<-PIR%>%选择(siteidn,iculevelname)%>%tbl_summary(by = iculevelname ,missing = "no",标签 = 列表(siteidn = "ICU 数量"),统计=列表(all_continuous() ~ "{mean} ({sd})",all_categorical() ~ "{n} ({p}%)")) %>%粗体标签()%>%italicize_levels()tbl_merge(tbls = 列表(t1,t2,t3),tab_spanner = c(**国家**"、**医院类型**"、**ICU级别**"))

这会产生下表:

解决方案

我同意 Ben 的观点,包含一个我们可以在我们的机器上运行的数据集总是很好的,以及一个你希望输出是什么样的示例.下面是一个代码示例,可以解决您的大部分问题.

<块引用>

  1. 有没有办法为不是患者编号的因子变量获取总计行?

我不确定您要在这里寻找什么.请提供更多详细信息.

<块引用>

  1. 合并表格后是否可以插入一个整体列(这样整体列不会出现在 Country 标题下)?

是的,您可以使用 modify_spanning_header() 函数删除Overall"列上方的标题.

<块引用>

  1. 有没有办法为患者数量创建一行,而标题中没有这些详细信息?

是的,如果您在数据集中创建一个对所有观察结果都为 TRUE 的新列,我们可以总结该列并报告 N.

此外,如果您只对单个变量进行交叉制表,则应查看 tbl_cross() 函数.它会自动添加总行数.

library(gtsummary)图书馆(tidyverse)set.seed(20210108)# 创建虚拟数据集PIR<-小费(siteidn = sample(c("1325", "1324", "1329"), 100, replace = TRUE) %>% factor(),countryname = sample(c(NZ", Australia"), 100, replace = TRUE) %>% factor(),host = sample(c("Metro", "Rural"), 100, replace = TRUE) %>% factor(),病人 = 真)%>%group_by(siteidn) %>%变异(count_site = row_number() == 1L # 每个站点一个 TRUE)%>%取消分组()%>%labelled::set_variable_labels(siteidn = "Number of ICUs", # 分配标签患者 =N")t1<-PIR%>%选择(患者,siteidn,国家/地区)%>%tbl_summary(by = 国名,缺少=否",统计=患者〜{n}";# 只在第一行打印 N)%>%modify_header(stat_by = "**{level}**") %>% # 从标题行中删除 Nsadd_overall(col_label = "**总体**")t2<-PIR%>%选择(患者,siteidn,hostt)%>%tbl_summary(通过 = 主机,缺少=否",统计=患者〜{n}";# 只在第一行打印 N)%>%modify_header(stat_by = "**{level}**") # 从标题行中删除 Nstbl <-tbl_merge(tbls = 列表(t1,t2),tab_spanner = c(**国家**",**医院类型**"))%>%粗体标签()%>%italicize_levels() %>%# 删除整个列的跨越标题,使用 `show_header_names(tbl)` 打印列名modify_spanning_header(stat_0_1 ~ NA) %>%modify_footnote(everything() ~ NA) # 删除脚注,因为它在此设置中没有提供信息

在从原始海报澄清之后,添加另一个如何呈现 Ns 的示例.

下表显示了两种显示患者 Ns 和站点数量的方法.第一行是有两个变量的两行,最后一行是信息可以在一行中呈现的方式.

t1 <- PIR %>%选择(患者,site_only = count_site,组合 = count_site,countryname)%>%tbl_summary(by = 国名,缺少=否",统计=列表(c(患者,site_only)〜{n}",组合 ~ "站点 N {n};总 N {N}"))

I have a dataset of approximately 700,000 patients where I have hospital site IDs (factor variable). I would like to create a row where the number of hospitals is visible (this is separate to the number of patients). I have 3 categorical variables as my columns in addition to an overall column.

At the moment, there is a separate row for each hospital id with the number of patients in each site for each category.

My code is as follows:

t1 <- PIR %>% 
  select(siteidn, countryname) %>% 
    tbl_summary(by = countryname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>%
  bold_labels() %>% 
  italicize_levels() %>% 
  add_overall()

t2 <- PIR %>% 
  select(siteidn, hospt) %>% 
    tbl_summary(by = hospt ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

t3 <- PIR %>% 
  select(siteidn, iculevelname) %>% 
    tbl_summary(by = iculevelname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

tbl_merge(
  tbls = list(t1, t2, t3),
  tab_spanner = c("**Country**", "**Hospital Type**", "**ICU Level**"))

This produces the following table:

Table 1

As can be seen, there is a separate row for each hospital ID. I'd like to have a single row where there are totals of the number of hospitals in each tier (i.e. total number of hospitals in Aus, NZ, Metropolitan, etc).

My questions are:

  1. Is there a way to get a total row for a factor variable that is not the patient number?
  2. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?
  3. Is there a way to create a row for the number of patients and not have those details in the headings?

Thanks all for your time.

Ben

ADDIT: Here is an image of what I would like the table to look like. I apologise for it's crudeness. I would like to have just one row for the factor variable of total Number of ICUs, rather than have a row of each ICU with the number of patients in it (Red Ink).

Additionally, is there a way to group the 2 rows under a common heading similar to the factor variables (Green Ink).

I appreciate that my R skills are rudementary. Thank you all for your patience!

Ben

解决方案

I agree with Ben, always good to include a dataset we can run on our machine, and an example of what you would like the output look like. Below is a code example that adresses most of your questions.

  1. Is there a way to get a total row for a factor variable that is not the patient number?

I am not sure what you're looking for here. More details please.

  1. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?

Yes, you can use the modify_spanning_header() function to remove the header above the Overall column.

  1. Is there a way to create a row for the number of patients and not have those details in the headings?

Yes, if you create a new column in your dataset that is TRUE for all observations, we can summarize that column and report the N.

Also, if you're only doing cross tabulations of a single variable, you should look into the tbl_cross() function. It adds the total rows automatically.

library(gtsummary)
library(tidyverse)
set.seed(20210108)

# create dummy dataset
PIR <- 
  tibble(
    siteidn = sample(c("1325", "1324", "1329"), 100, replace = TRUE) %>% factor(),
    countryname = sample(c("NZ", "Australia"), 100, replace = TRUE) %>% factor(),
    hospt = sample(c("Metro", "Rural"), 100, replace = TRUE) %>% factor(),
    patient = TRUE
  ) %>%
  group_by(siteidn) %>%
  mutate(
    count_site = row_number() == 1L # one TRUE per site
  ) %>%
  ungroup() %>%
  labelled::set_variable_labels(siteidn = "Number of ICUs", # Assigning labels 
                                patient = "N")

t1 <- PIR %>% 
  select(patient, siteidn, countryname) %>% 
  tbl_summary(
    by = countryname,
    missing = "no", 
    statistic = patient ~ "{n}" # only print N for the top row
  ) %>% 
  modify_header(stat_by = "**{level}**") %>% # Remove the Ns from the header row
  add_overall(col_label = "**Overall**")
t2 <- PIR %>% 
  select(patient, siteidn, hospt) %>% 
  tbl_summary(
    by = hospt,
    missing = "no", 
    statistic = patient ~ "{n}" # only print N for the top row
  ) %>%
  modify_header(stat_by = "**{level}**") # Remove the Ns from the header row

tbl <-
  tbl_merge(
    tbls = list(t1, t2),
    tab_spanner = c("**Country**", "**Hospital Type**")
  ) %>%
  bold_labels() %>% 
  italicize_levels() %>%
  # remove spanning header for overall column, use `show_header_names(tbl)` to print column names
  modify_spanning_header(stat_0_1 ~ NA) %>%
  modify_footnote(everything() ~ NA) # remove footnote, as it's not informative in this setting

EDIT: After clarification from original poster, adding another example of how one could present the Ns.

The table below shows two ways to show the Ns for the patients and the number of sites. The first is on two lines with two variables, and the last line is a way the information can be presented on a single line.

t1 <- PIR %>% 
  select(patient, site_only = count_site, combination = count_site, countryname) %>% 
  tbl_summary(
    by = countryname,
    missing = "no", 
    statistic = list(c(patient, site_only) ~ "{n}", 
                     combination ~ "Site N {n}; Total N {N}")
  )

这篇关于带有分类变量总计的 R gsummary Row的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆