R:如何支配和计数数据。框架(例如:医疗状况列表和每个患者的数量) [英] R: How to pivot and count data.frame (ex: list of medical conditions and the number of patients with each)
问题描述
╔═══════════ ════════════╦═════╦════════╦══════════════╦═══════ ═════════════════════$║║║║║║║║║║║║║║║║║║║║║║║║║║║║═════ ═════╬════════════╬═════╬════════╬══════════════╬═ ═════════════════════════$║$║║║║║║║║║║║║║║║║║║║║║║║44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 ║0║69║F║1║0║0║
║689348║0║56║F║0║1║ 1║
║902498║1║45║M║0║0║1║
║...║║║║║║║
╚═══════════ ╩════════════╩═════╩════════╩══════════════╩══════ ════════════════$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ table table table table table table table table table table table table table table table table table table table table table table table传达以下信息:
╔═════════════════════ ══╦══════════════════════════════════ $ b║║总║Mlt50哟║lt lt yo║║║║║║║║║║║║║║║║╠╠║══╬╬╬╬╬ ══════ ══════════════════════════════$ $ $ $║║║║║║║║║║║║║║ ║65║4║97║
║糖尿病║13922║54║73║192║247║
║高血压║8210║102║187║443║574║
╚════ ══════════╩════════╩══════════╩══════════╩════════ ══╩══════════╝
总共是合并症患者总数(足够容易:sum(数据$ estrogen == 1)等)。其他细胞现在是那个年龄和性别分层患者的数量,其中had_stroke == 1。
我很想得到一个一般的想法如何处理像这样的问题,因为它似乎是一种非常基本的数据转换方式。
解决方案尝试做更简单。
我假设你有一个
data.frame
调用数据
。这是一个玩具数据集。set.seed(0)
data< - data.frame = runif(100)< .10,
糖尿病= runif(100)< .15,
高血压= runif(100)< .20,
groups = cut(runif 100),c(0,.1,.4,.7,1),labels = c(my,fy,mo,fo)))
为组添加新的var到数据框。
然后,使用
table()
获取摘要res< - rbind(
表(数据$ estrogen,数据$ groups)[2,],
表(数据$糖尿病,数据$组) [2,],
表(data $ hypertension,data $ groups)[2,]
)
res < - cbind(apply(res,1,sum),res)
最后,使用
colnames(res)
yrownames(res)
为列和行设置适当的名称。colnames (res)[1]< - Total
pre>
rownames(res)< - c(雌激素,糖尿病,高血压)
结果
总计我的fy mo fo
雌激素12 2 2 4 4
糖尿病28 1 8 11 8
高血压27 1 10 11 5
I'm trying to get better with dplyr and tidyr but I'm not used to "thinking in R". An example may be best. The table I've generated from my data in sql looks like this:
╔═══════════╦════════════╦═════╦════════╦══════════════╦══════════╦══════════════╗ ║ patientid ║ had_stroke ║ age ║ gender ║ hypertension ║ diabetes ║ estrogen HRT ║ ╠═══════════╬════════════╬═════╬════════╬══════════════╬══════════╬══════════════╣ ║ 934988 ║ 1 ║ 65 ║ M ║ 1 ║ 1 ║ 0 ║ ║ 94044 ║ 0 ║ 69 ║ F ║ 1 ║ 0 ║ 0 ║ ║ 689348 ║ 0 ║ 56 ║ F ║ 0 ║ 1 ║ 1 ║ ║ 902498 ║ 1 ║ 45 ║ M ║ 0 ║ 0 ║ 1 ║ ║ … ║ ║ ║ ║ ║ ║ ║ ╚═══════════╩════════════╩═════╩════════╩══════════════╩══════════╩══════════════╝I would like to create an output table that conveys the following information:
╔══════════════╦════════╦══════════╦══════════╦══════════╦═══════════╗ ║ ║ total ║M lt50 yo ║F lt50 yo ║M gte50yo ║F gte 50yo ║ ╠══════════════╬════════╬══════════╬══════════╬══════════╬═══════════╣ ║ estrogen HRT ║ 347 ║ 2 ║ 65 ║ 4 ║ 97 ║ ║ diabetes ║ 13922 ║ 54 ║ 73 ║ 192 ║ 247 ║ ║ hypertension ║ 8210 ║ 102 ║ 187 ║ 443 ║ 574 ║ ╚══════════════╩════════╩══════════╩══════════╩══════════╩═══════════╝Total is the total number of patients with that comorbidity (easy enough: sum(data$estrogen == 1) etc). The other cells are now the number of patients with that comorbidity in that age and gender stratification where had_stroke==1.
I'd love to just get a general idea of how to approach problems like this as it seems like a pretty fundamental way to transform data. If the total column makes it funky then feel free to exclude that.
解决方案Try to do simpler.
I assume that you have a
data.frame
calleddata
. These is a toy data set.set.seed(0) data <- data.frame(estrogen = runif(100) < .10, diabetes = runif(100) < .15, hypertension = runif(100) < .20, groups = cut(runif(100), c(0,.1,.4,.7,1), labels = c("my", "fy", "mo", "fo")))
Add new var to data frame for groups.
Then, use
table()
to get summariesres <- rbind( table(data$estrogen, data$groups)[2,], table(data$diabetes, data$groups)[2,], table(data$hypertension, data$groups)[2,] ) res <- cbind(apply(res, 1, sum), res)
Finaly, use
colnames(res)
yrownames(res)
to set appropriate names to columns and rows.colnames(res)[1] <- "Total" rownames(res) <- c("estrogen", "diabetes", "hypertension")
Results
Total my fy mo fo estrogen 12 2 2 4 4 diabetes 28 1 8 11 8 hypertension 27 1 10 11 5
这篇关于R:如何支配和计数数据。框架(例如:医疗状况列表和每个患者的数量)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!