多组直方图与组特定的频率 [英] Multi-group histogram with group-specific frequencies

查看:274
本文介绍了多组直方图与组特定的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我已经阅读了以下主题:



下面是一个模拟我自己的模拟数据集:

  df <-data。 (c(CG,CC,GG),60,replace = T),
Study_Group = sample(c ,Pathology1,pathology2),60,replace = T))

尝试变种 p + geom_bar(aes(aes(y = ..count ../ sum(.. count ..))但r返回找不到'count'对象或其他的东西。



我也试过:

  df.new <-ddply(df,。(Study_Group),summary> 
prop = prop.table(table(df $ Genotype)),
Genotype = names(table(df $ Genotype)) )`

我相信s出现错误ummarise的功能,但说实话,我不知道我在做什么。



问题只是我对解决方案的理解,还是它与我的内在不同数据集?



感谢您的帮助。 给这个一试。在这里,我使用的是 dplyr ,它是一个包,其中包含 ddply 类型函数的更新版本 plyr 。有一件事,我不知道你想让你的x轴是 Study_Group s还是你的 Genotypes 。你的问题表明你想在每个组中使用 Genotype 的频率,但是你的图在x上有 Genotypes 。解决方案遵循所述的愿望,而不是情节。但是,对x进行更改以获取 Genotype 很简单。我会在代码注释中注明哪些地方以及需要做什么修改。

  library(dplyr)
library(ggplot2)

df2 < - df% >%
count(Study_Group,Genotypes)%>%
group_by(Study_Group)%>%#为group_by(基因型)%>变更%#替代方法
mutate (prop = n / sum(n))

ggplot(data = df2,aes(Study_Group,prop,fill = Genotypes))+
geom_bar(stat =identity,position = dodge)


First off, I've already read the following thread: ggplot2 - Multi-group histogram with in-group proportions rather than frequency

I followed the ddply suggestion and it didn't seem to work for my data. Logically the code should work perfectly on my dataset and I have no idea what I'm doing wrong.

Overall: I'd like to make a histogram (I'm learning ggplot) that displays the genotype frequency in each of my study groups.

Something like this:

Here's a mock data set that mirrors my own:

df<-data.frame(ID=1:60,
               Genotypes=sample(c("CG", "CC", "GG"), 60, replace=T),
               Study_Group=sample(c("Control", "Pathology1", "pathology2"), 60, replace=T))

I've tried variants of p + geom_bar(aes(aes(y = ..count../sum(..count..)) but r returns "cannot find 'count' object" or something to that effect.

I also tried:

df.new<-ddply(df,.(Study_Group),summarise,
              prop=prop.table(table(df$Genotype)),
              Genotype=names(table(df$Genotype)))`

And I believe there was an error with the summarise function, but to be honest, I have no idea what I'm doing.

Is the problem simply my comprehension of the solution or is it something inherently different in my data set?

Thanks for the help.

解决方案

Give this a try. In this, I am using dplyr which is a package that contains updated versions of the ddply-type functions from plyr. One thing, I am not sure if you want to have your x-axis be the Study_Groups or your Genotypes. your question states you want the frequency of Genotype within each group but your graph has the Genotypes on the x. The solution follows the stated desire, not the plot. However, making the change to get Genotype on the x is simple. I'll note in the code comments where and what change to make.

library(dplyr)
library(ggplot2)

df2 <- df %>%
  count(Study_Group, Genotypes) %>%
  group_by(Study_Group) %>% #change to `group_by(Genotypes) %>%` for alternative approach
  mutate(prop = n / sum(n))

ggplot(data = df2, aes(Study_Group, prop, fill = Genotypes)) + 
  geom_bar(stat = "identity", position = "dodge")

这篇关于多组直方图与组特定的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆