"密度"曲线叠加在直方图上,其中纵轴是频率(又名计数)还是相对频率? [英] "Density" curve overlay on histogram where vertical axis is frequency (aka count) or relative frequency?

查看:248
本文介绍了"密度"曲线叠加在直方图上,其中纵轴是频率(又名计数)还是相对频率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当垂直轴是频率或相对频率时,是否有一种方法可以将类似于密度曲线的东西重叠? (不是一个实际的密度函数,因为该区域不需要集成到1)。下面的问题是类似的:
ggplot2:具有正常曲线的直方图,并且用户自行回答该想法以缩放 .. count .. geom_density()。然而这看起来不寻常。



以下代码会产生一个过度膨胀的密度线。

  df1 < -  data.frame(v = rnorm(164,mean = 9,sd = 1.5))
b1 < - seq(4.5,12,by = 0.1)
hist.1a< - ggplot(df1,aes(v))+
stat_bin(aes(y = ..count ..),color =black,fill =blue,
breaks = b1)+
geom_density(aes(y = ..count ..))
hist.1a

解决方案

@ joran的回应/评论让我想到了适当的缩放因子是什么。为了后代的缘故,结果如下。



垂直轴为频率(又名计数)





因此,缩放以bin计数衡量的垂直轴的因子是bb
$ b



在这种情况下,使用 N = 164 和垃圾箱宽度为 0.1 ,平滑线中的y的美学应该是:

  y = ..density .. *(164 * 0.1)

因此,下面的代码产生一个密度 (aka计数)。

  df1 < -  data.frame(v = rnorm(164,mean = 9,sd = 1.5))
b1 < - seq(4.5,12,by = 0.1)
hist.1a < - ggplot(df1,aes(x = v))+
geom_histogram( aes(y = ..count ..),休息= b1,
填充=蓝色,颜色=黑色)+
geom_density(aes(y = ..density .. *(164 * 0.1)))
hist.1a



纵轴为相对频率 /i.stack.imgur.com/VL0LP.gifalt =freq>



使用上面的代码,我们可以写出

  hist.1b < -  ggplot(df1,aes(x = v))+ 
geom_histogram(aes(y = ..count ../ 164 ),break = b1,
fill =blue,color =black)+
geom_density(aes(y = ..density .. *(0.1)))
hist。 1b



<$ p $

p> hist.1c < - ggplot(df1,aes(x = v))+
geom_histogram(aes(y = ..density ..),breaks = b1,
fill =blue,color =black)+
geom_density(aes(y = ..density ..) )
hist.1c


Is there a method to overlay something analogous to a density curve when the vertical axis is frequency or relative frequency? (Not an actual density function, since the area need not integrate to 1.) The following question is similar: ggplot2: histogram with normal curve, and the user self-answers with the idea to scale ..count.. inside of geom_density(). However this seems unusual.

The following code produces an overinflated "density" line.

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(v)) + 
                    stat_bin(aes(y = ..count..), color = "black", fill = "blue",
                             breaks = b1) + 
                    geom_density(aes(y = ..count..))
hist.1a

解决方案

@joran's response/comment got me thinking about what the appropriate scaling factor would be. For posterity's sake, here's the result.

When Vertical Axis is Frequency (aka Count)

Thus, the scaling factor for a vertical axis measured in bin counts is

In this case, with N = 164 and the bin width as 0.1, the aesthetic for y in the smoothed line should be:

y = ..density..*(164 * 0.1)

Thus the following code produces a "density" line scaled for a histogram measured in frequency (aka count).

df1            <- data.frame(v = rnorm(164, mean = 9, sd = 1.5))
b1             <- seq(4.5, 12, by = 0.1)
hist.1a        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(164*0.1)))
hist.1a

When Vertical Axis is Relative Frequency

Using the above, we could write

hist.1b        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..count../164), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..*(0.1)))
hist.1b

When Vertical Axis is Density

hist.1c        <- ggplot(df1, aes(x = v)) + 
                    geom_histogram(aes(y = ..density..), breaks = b1, 
                                   fill = "blue", color = "black") + 
                    geom_density(aes(y = ..density..))
hist.1c

这篇关于&QUOT;密度&QUOT;曲线叠加在直方图上,其中纵轴是频率(又名计数)还是相对频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆