R:如何支配和计数数据。框架(例如:医疗状况列表和每个患者的数量) [英] R: How to pivot and count data.frame (ex: list of medical conditions and the number of patients with each)

查看:235
本文介绍了R:如何支配和计数数据。框架(例如:医疗状况列表和每个患者的数量)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用dplyr和tidyr变得更好,但我不习惯在R思考。一个例子可能是最好的。我从sql中的数据生成的表格如下所示:

 
╔═══════════ ════════════╦═════╦════════╦══════════════╦═══════ ═════════════════════$║║║║║║║║║║║║║║║║║║║║║║║║║║║║═════ ═════╬════════════╬═════╬════════╬══════════════╬═ ═════════════════════════$║$║║║║║║║║║║║║║║║║║║║║║║║44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 ║0║69║F║1║0║0║
║689348║0║56║F║0║1║ 1║
║902498║1║45║M║0║0║1║
║...║║║║║║║
╚═══════════ ╩════════════╩═════╩════════╩══════════════╩══════ ════════════════$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ table table table table table table table table table table table table table table table table table table table table table table table传达以下信息:

 
╔═════════════════════ ══╦══════════════════════════════════ $ b║║总║Mlt50哟║lt lt yo║║║║║║║║║║║║║║║║╠╠║══╬╬╬╬╬ ══════ ══════════════════════════════$ $ $ $║║║║║║║║║║║║║║ ║65║4║97║
║糖尿病║13922║54║73║192║247║
║高血压║8210║102║187║443║574║
╚════ ══════════╩════════╩══════════╩══════════╩════════ ══╩══════════╝

总共是合并症患者总数(足够容易:sum(数据$ estrogen == 1)等)。其他细胞现在是那个年龄和性别分层患者的数量,其中had_stroke == 1。



我很想得到一个一般的想法如何处理像这样的问题,因为它似乎是一种非常基本的数据转换方式。

解决方案

尝试做更简单。



我假设你有一个 data.frame 调用数据。这是一个玩具数据集。

  set.seed(0)
data< - data.frame = runif(100)< .10,
糖尿病= runif(100)< .15,
高血压= runif(100)< .20,
groups = cut(runif 100),c(0,.1,.4,.7,1),labels = c(my,fy,mo,fo)))

为组添加新的var到数据框。



然后,使用 table()获取摘要

  res<  -  rbind(
表(数据$ estrogen,数据$ groups)[2,],
表(数据$糖尿病,数据$组) [2,],
表(data $ hypertension,data $ groups)[2,]

res < - cbind(apply(res,1,sum),res)

最后,使用 colnames(res) y rownames(res)为列和行设置适当的名称。

  colnames (res)[1]<  - Total
rownames(res)< - c(雌激素,糖尿病,高血压)
pre>

结果

 总计我的fy mo fo 
雌激素12 2 2 4 4
糖尿病28 1 8 11 8
高血压27 1 10 11 5


I'm trying to get better with dplyr and tidyr but I'm not used to "thinking in R". An example may be best. The table I've generated from my data in sql looks like this:

╔═══════════╦════════════╦═════╦════════╦══════════════╦══════════╦══════════════╗
║ patientid ║ had_stroke ║ age ║ gender ║ hypertension ║ diabetes ║ estrogen HRT ║
╠═══════════╬════════════╬═════╬════════╬══════════════╬══════════╬══════════════╣
║ 934988    ║          1 ║  65 ║ M      ║            1 ║        1 ║            0 ║
║ 94044     ║          0 ║  69 ║ F      ║            1 ║        0 ║            0 ║
║ 689348    ║          0 ║  56 ║ F      ║            0 ║        1 ║            1 ║
║ 902498    ║          1 ║  45 ║ M      ║            0 ║        0 ║            1 ║
║ …         ║            ║     ║        ║              ║          ║              ║
╚═══════════╩════════════╩═════╩════════╩══════════════╩══════════╩══════════════╝

I would like to create an output table that conveys the following information:

╔══════════════╦════════╦══════════╦══════════╦══════════╦═══════════╗
║              ║ total  ║M lt50 yo ║F lt50 yo ║M gte50yo ║F gte 50yo ║
╠══════════════╬════════╬══════════╬══════════╬══════════╬═══════════╣
║ estrogen HRT ║    347 ║        2 ║       65 ║        4 ║        97 ║
║ diabetes     ║  13922 ║       54 ║       73 ║      192 ║       247 ║
║ hypertension ║   8210 ║      102 ║      187 ║      443 ║       574 ║
╚══════════════╩════════╩══════════╩══════════╩══════════╩═══════════╝

Total is the total number of patients with that comorbidity (easy enough: sum(data$estrogen == 1) etc). The other cells are now the number of patients with that comorbidity in that age and gender stratification where had_stroke==1.

I'd love to just get a general idea of how to approach problems like this as it seems like a pretty fundamental way to transform data. If the total column makes it funky then feel free to exclude that.

解决方案

Try to do simpler.

I assume that you have a data.frame called data. These is a toy data set.

set.seed(0)
data <- data.frame(estrogen = runif(100) < .10,
               diabetes = runif(100) < .15,
               hypertension = runif(100) < .20,
               groups = cut(runif(100), c(0,.1,.4,.7,1), labels = c("my", "fy", "mo", "fo")))

Add new var to data frame for groups.

Then, use table() to get summaries

res <- rbind(
  table(data$estrogen, data$groups)[2,],
  table(data$diabetes, data$groups)[2,],
  table(data$hypertension, data$groups)[2,]
)
res <- cbind(apply(res, 1, sum), res)

Finaly, use colnames(res) y rownames(res) to set appropriate names to columns and rows.

colnames(res)[1] <- "Total"
rownames(res) <- c("estrogen", "diabetes", "hypertension")

Results

             Total my fy mo fo
estrogen        12  2  2  4  4
diabetes        28  1  8 11  8
hypertension    27  1 10 11  5

这篇关于R:如何支配和计数数据。框架(例如:医疗状况列表和每个患者的数量)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆