计算堆积分组的barplot的可变百分比 [英] Calculate within variable percentage for stacked grouped barplot
问题描述
我想通过计算每个var在案例和控件中出现或缺失的次数,为几个不同的变量var1PA,var2PA绘制分组的堆叠barplot。
I would like to plot a grouped stacked barplot for several different variables, var1PA, var2PA by calculating how many times each var was present or absent within a case and a control.
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
我想计算每个案例中每个案例和每个控件的存在和缺失的百分比,并且无法使用prop table来完成它,
I want to calculate percentage of present and absent for each case and each control within each var and am unable to do it with prop table,
vars <- c('Var1PA', 'Var2PA')
tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100)
##above line does not calculate the percentage of present absent individually for cases
##and controls within each var
如果我能够做到,那么我可以使用ggplot2来绘制:
if I am able to do it then I can use ggplot2 to plot:
ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars)
如何获得百分比缺席)和控制(现在和缺席)每个变量?谢谢!
How do I get percentages for cases (present and absent) and controls (present and absent) for each of the vars? Thanks!
推荐答案
这是最后一个问题的一个相当简单的扩展。在将数据转换为长格式时,我们将 Disease
视为 SampleID
,否则代码是相同的:
This is a fairly simple extension of the last question. In getting the data to long format, we treat Disease
just like SampleID
, otherwise the code is identical:
library(ggplot2)
library(tidyr)
library(dplyr)
mdf = df %>% select(SampleID, Disease, ends_with("PA")) %>%
gather(key = Var, value = PA, -SampleID, -Disease) %>%
mutate(PA = factor(PA, levels = c("Present", "Absent")))
然后直接依靠 ggplot
来计算百分比。这与上一个问题的情节完全相同,但是在x轴上添加了 Disease
并添加了刻面。
We can then go directly to a plot relying on ggplot
to compute the percentages. This is identical the plot in the previous question, but with Disease
on the x-axis and the faceting added.
ggplot(mdf, aes(Disease)) +
geom_bar(aes(fill = PA), position = "fill") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~Var)
如果您希望数据中的百分比框架,我们可以做一点操作:
If you want the percentages in the data frame, we can do that with a little more manipulation:
df_summ = mdf %>% group_by(Disease, Var) %>%
mutate(n = n()) %>% ## calculate n for Disease and Var groups
group_by(Disease, Var, PA) %>%
summarize(Percent = n() / first(n)) ## calculate the fraction P/A in each group
通过这个总结的数据框,我们可以更加明确地创建与上面相同的图:
With that summarized data frame, we can create the same plot as above more explicitly:
ggplot(df_summ, aes(Disease, Percent)) +
geom_bar(aes(fill = PA), position = "stack", stat = "identity") +
scale_y_continuous(labels = scales::percent) +
facet_grid(~Var)
这篇关于计算堆积分组的barplot的可变百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!