R:向ggplot2中的分组直方图添加法线拟合 [英] R: add normal fits to grouped histograms in ggplot2

查看:62
本文介绍了R:向ggplot2中的分组直方图添加法线拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找最优雅的方法来叠加正态分布,以适合 ggplot2 中的分组直方图.我知道这个问题已经被问过很多次了,但是没有一个建议的选项,例如

我想知道这种方法是否也可以用于将正态分布拟合添加到分组直方图中,如

 库(devtools)install_github("tomwenseleers/easyGgplot2",type ="source")library("easyGgplot2")#装载重量数据ggplot(重量,aes(x =重量))++ geom_histogram(aes(y = ..count ..,colour = sex,fill = sex),alpha = 0.5,position ="identity") 

我还想知道是否有任何软件包可以为 ggplot2 定义 + stat_distrfit() + stat_normfit()机会(有可能分组)?(我真的找不到任何东西,但这似乎很普通,所以我只是想知道)

原因我希望代码尽可能短,因为这是一门课程,并且我想让事情尽可能简单...

PS geom_density 不适合我的目标,我还想绘制计数/频率而不是密度.我也想将它们放在同一面板中,并避免使用 facet_wrap

解决方案

喜欢吗?

  ##模拟您的数据集-无法加载easyGplot2....set.seed(1)#可重现的示例体重<-data.frame(sex = c("Female","Male"),weight = rnorm(1000,mean = c(65,67),sd = 1))库(ggplot2)库(MASS)#适合fitdistr(...)get.params<-function(z)with(fitdistr(z,"normal"),estimate [1:2])df<-聚合(体重〜性别,体重,get.params)df <-data.frame(sex = df [,1],df [,2])x<-with(weight,seq(min(weight),max(weight),len = 100))gg<-data.frame(weight = rep(x,nrow(df)),df)gg $ y<-with(gg,dnorm(x,mean,sd))gg $ y<-gg $ y *总计(weight〜sex,weight,length)$ weight * diff(range(weight $ weight))/30ggplot(重量,aes(x =重量,颜色=性别))+geom_histogram(aes(y = ..count ..,fill = sex),alpha = 0.5,position ="identity")+geom_line(data = gg,aes(y = y)) 

我认为优雅"在情人眼中.使用 stat_function(...)的问题在于,无法使用 aes(...)映射 args = ... 列表,如评论中的帖子所述.因此,您必须创建一个辅助data.frame(在此示例中为 gg ),该数据具有已拟合分布的x值和y值,并使用 geom_line(...).

上面的代码使用 MASS 包中的 fitdistr(...)来计算按性别分组的数据均值和sd的最大似然估计基于正态性假设(如果可以的话,可以使用其他分布).然后,它通过将 weight 中的范围划分为100个增量来创建x轴,并为适当的均值和sd计算 dnorm(x,...).由于结果是密度,因此我们必须使用以下方法进行调整:

  gg $ y<-gg $ y *总计(weight〜sex,weight,length)$ weight * diff(range(weight $ weight))/30 

因为要将此数据与计数数据进行映射.请注意,这假定您使用geom_histogram中的默认装箱(将x中的范围分成30个相等的增量).最后,我们使用 gg 作为特定于图层的数据集将调用添加到 geom_line(...).

I am on the lookout for the most elegant way to superimpose normal distribution fits in grouped histograms in ggplot2. I know this question has been asked many times before, but none of the proposed options, like this one or this one struck me as very elegant, at least not unless stat_function could be made to work on each particular subsection of the data.

One relatively elegant way to superimpose a normal distribution fit onto a non-grouped histogram that I did come across was using geom_smooth and method="nls" (aside from the fact then that it's not a self-starting function and that starting values have to be specified) :

library(ggplot2)
myhist = data.frame(size = 10:27, counts = c(1L, 3L, 5L, 6L, 9L, 14L, 13L, 23L, 31L, 40L, 42L, 22L, 14L, 7L, 4L, 2L, 2L, 1L) )
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + 
     geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, 
                 start=list(m=20, s=5, N=300)) 

I was wondering though whether this approach could also be used to add normal distribution fits to grouped histograms as in

library(devtools)
install_github("tomwenseleers/easyGgplot2",type="source")
library("easyGgplot2") # load weight data
ggplot(weight,aes(x = weight)) + 
+     geom_histogram(aes(y = ..count.., colour=sex, fill=sex),alpha=0.5,position="identity")

I was also wondering if there are any packages that might define a + stat_distrfit() or + stat_normfit() for ggplot2 by any chance (with the possibility for grouping) ? (I couldn't really find anything, but this would seem like a common enough task, so I was just wondering)

Reason I want the code to be as short as possible is that this is for a course, and that I want to keep things as easy as possible...

PS geom_density does not suit my goal and I would also like to plot the counts/frequencies as opposed to the density. I would also like to have them in the same panel, and avoid using facet_wrap

解决方案

Like this?

## simulate your dataset - could not get easyGplot2 to load....
set.seed(1)     # for reproducible example
weight <- data.frame(sex=c("Female","Male"), weight=rnorm(1000,mean=c(65,67),sd=1))

library(ggplot2)
library(MASS)       # for fitdistr(...)
get.params <- function(z) with(fitdistr(z,"normal"),estimate[1:2])
df <- aggregate(weight~sex, weight, get.params)
df <- data.frame(sex=df[,1],df[,2])
x  <- with(weight, seq(min(weight),max(weight),len=100))
gg <- data.frame(weight=rep(x,nrow(df)),df)
gg$y <- with(gg,dnorm(x,mean,sd))
gg$y <- gg$y * aggregate(weight~sex, weight,length)$weight * diff(range(weight$weight))/30

ggplot(weight,aes(x = weight, colour=sex)) + 
  geom_histogram(aes(y = ..count.., fill=sex), alpha=0.5,position="identity") +
  geom_line(data=gg, aes(y=y))  

I suppose "elegant" is in the eye of the beholder. The problem with using stat_function(...) is that the args=... list cannot be mapped using aes(...), as the post in the comments explains. So you have to create an auxiliary data.frame (gg in this example) that has the x- and y-values for the fitted distributions, and use geom_line(...).

The code above uses fitdistr(...) in the MASS package to calculate maximum likelihood estimates of the mean and sd of your data, grouped by gender, based on the normality assumption (you can use a different distribution if that makes sense). It then creates an x-axis by dividing the range in weight into 100 increments, and calculates dnorm(x,...) for the appropriate mean and sd. Since the result is density, we have to adjust that using:

gg$y <- gg$y * aggregate(weight~sex, weight,length)$weight * diff(range(weight$weight))/30

because you want to map this against count data. Note that this assumes you use the default binning in geom_histogram (which divides the range in x into 30 equal increments). Finally, we add the call to geom_line(...) using gg as the layer-specific dataset.

这篇关于R:向ggplot2中的分组直方图添加法线拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆