如何解释不同的ggplot2密度? [英] How to interpret the different ggplot2 densities?
问题描述
我对ggplot中geom_density
的以下变体的含义感到困惑:
I am confused about the meaning of the following variants of geom_density
in ggplot:
有人可以解释这四个电话之间的区别吗?
Can someone please explain the difference between these four calls:
-
geom_density(aes_string(x=myvar))
-
geom_density(aes_string(x=myvar, y=..density..))
-
geom_density(aes_string(x=myvar, y=..scaled..))
-
geom_density(aes_string(x=myvar, y=..count../sum(..count..)))
geom_density(aes_string(x=myvar))
geom_density(aes_string(x=myvar, y=..density..))
geom_density(aes_string(x=myvar, y=..scaled..))
geom_density(aes_string(x=myvar, y=..count../sum(..count..)))
我的理解是:
-
仅
-
geom_density
会产生密度,其曲线下面积之和为1 -
geom_density
与..density..
基本上是相同的...? -
..count../sum(..count..)
会将峰高归一化,使其更像归一化的直方图,确保所有高度之和为1 -
..count..
本身不带分母的情况下,只会将每个bin乘以其中的#个项目 -
..scaled..
参数将使它成为最大值,因此密度的最大值为1.
geom_density
alone will produce a density whose area under the curve sums to 1geom_density
with..density..
basically does the same... ?- the
..count../sum(..count..)
will normalize the peak heights to be more like a normalized histogram, ensuring that all the heights sum to 1 - the
..count..
by itself without the denominator will just multiply each bin by # of items in it - the
..scaled..
parameter will make it so the maximum value of the density is 1.
我发现..scaled..
非常违反直觉,并且如果我对它的解释是正确的,则从未见过使用过它,因此我想忽略它.我主要是在寻找geom_density
与一种归一化密度图之间的区别的澄清,我假设这需要...count../...
自变量.谢谢.
I find ..scaled..
very counterintuitive and have never seen it used if my interpretation of it is correct so I'd like to ignore that. I am mainly looking for a clarification of the differences between geom_density
and a kind of normalized density plot, which I am assuming requires the ...count../...
argument. thanks.
(相关:将ggplot2映射变量映射到y和使用stat ="bin" )
推荐答案
stat_density
的默认美观度是..density..
,因此默认情况下使用stat_density
的geom_density
调用将按以下方式绘制y = ..density..
默认.
The default aesthetic for stat_density
is ..density..
, so a call to geom_density
which uses stat_density
by default, will plot y = ..density..
by default.
通过查看源代码
..scaled..
定义为
densdf$scaled <- densdf$y / max(densdf$y, na.rm = TRUE)
如果愿意,可以忽略它.
Feel free to ignore it if you wish.
查看 stat_bin的源代码
结果是这样计算的
res <- within(results, {
count[is.na(count)] <- 0
density <- count / width / sum(abs(count), na.rm=TRUE)
ncount <- count / max(abs(count), na.rm=TRUE)
ndensity <- density / max(abs(density), na.rm=TRUE)
})
因此,如果您想比较geom_histogram
的结果(使用默认的stat = 'bin'
),则可以设置y = ..density..
,它将为您计算count / sum(count)
(计算仓的宽度)
So if you want to compare the results of geom_histogram
(using the default stat = 'bin'
), then you can set y = ..density..
and it will calculate count / sum(count)
for you (accounting for the width of the bins)
如果您想将geom_density(aes(y=..scaled..))
与stat_bin
进行比较,则可以使用geom_histogram(aes(y = ..ndensity..))
If you wanted to compare geom_density(aes(y=..scaled..))
with stat_bin
, then you would use geom_histogram(aes(y = ..ndensity..))
也可以通过同时使用..count..
来获得相同的比例,但是您需要调整stat_density
中的adjust
参数以获得适当的曲线近似值.
You could get them on the same scale by using ..count..
in both as well, however you would need to adjust the adjust
parameter in stat_density
to get the appropriately detailed approximation of the curve.
这篇关于如何解释不同的ggplot2密度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!