计算堆积分组的barplot的可变百分比 [英] Calculate within variable percentage for stacked grouped barplot

查看:142
本文介绍了计算堆积分组的barplot的可变百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过计算每个var在案例和控件中出现或缺失的次数,为几个不同的变量var1PA,var2PA绘制分组的堆叠barplot。

I would like to plot a grouped stacked barplot for several different variables, var1PA, var2PA by calculating how many times each var was present or absent within a case and a control.

 df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
         Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
         Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
         Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
         Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
         Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))

我想计算每个案例中每个案例和每个控件的存在和缺失的百分比,并且无法使用prop table来完成它,

I want to calculate percentage of present and absent for each case and each control within each var and am unable to do it with prop table,

vars <- c('Var1PA', 'Var2PA')
   tt <- data.frame(prop.table(as.table(sapply(df[, vars], table)), 2) * 100)
##above line does not calculate the percentage of present absent individually for cases 
##and controls within each var

如果我能够做到,那么我可以使用ggplot2来绘制:

if I am able to do it then I can use ggplot2 to plot:

ggplot(tt, aes(Disease, Freq)) +
geom_bar(aes(fill = Var1), position = "stack", stat="identity") + facet_grid(~vars)

如何获得百分比缺席)和控制(现在和缺席)每个变量?谢谢!

How do I get percentages for cases (present and absent) and controls (present and absent) for each of the vars? Thanks!

推荐答案

这是最后一个问题的一个相当简单的扩展。在将数据转换为长格式时,我们将 Disease 视为 SampleID ,否则代码是相同的:

This is a fairly simple extension of the last question. In getting the data to long format, we treat Disease just like SampleID, otherwise the code is identical:

library(ggplot2)
library(tidyr)
library(dplyr)
mdf = df %>% select(SampleID, Disease, ends_with("PA")) %>%
    gather(key = Var, value = PA, -SampleID, -Disease) %>%
    mutate(PA = factor(PA, levels = c("Present", "Absent")))

然后直接依靠 ggplot 来计算百分比。这与上一个问题的情节完全相同,但是在x轴上添加了 Disease 并添加了刻面。

We can then go directly to a plot relying on ggplot to compute the percentages. This is identical the plot in the previous question, but with Disease on the x-axis and the faceting added.

ggplot(mdf, aes(Disease)) +
    geom_bar(aes(fill = PA), position = "fill") +
    scale_y_continuous(labels = scales::percent) +
    facet_grid(~Var)

如果您希望数据中的百分比框架,我们可以做一点操作:

If you want the percentages in the data frame, we can do that with a little more manipulation:

df_summ = mdf %>% group_by(Disease, Var) %>%
    mutate(n = n()) %>%  ## calculate n for Disease and Var groups
    group_by(Disease, Var, PA) %>%
    summarize(Percent = n() / first(n))  ## calculate the fraction P/A in each group

通过这个总结的数据框,我们可以更加明确地创建与上面相同的图:

With that summarized data frame, we can create the same plot as above more explicitly:

ggplot(df_summ, aes(Disease, Percent)) +
    geom_bar(aes(fill = PA), position = "stack", stat = "identity") +
    scale_y_continuous(labels = scales::percent) +
    facet_grid(~Var)

这篇关于计算堆积分组的barplot的可变百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆