使用不同比例的累积分布曲线(ECDF)创建ggplot2直方图 [英] Creating a ggplot2 histogram with a cumulative distribution curve (ECDF) on a different scale

查看:99
本文介绍了使用不同比例的累积分布曲线(ECDF)创建ggplot2直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用ggplot2,我可以使用以下代码创建具有累积分布曲线的直方图.但是, stat_ecdf 曲线被缩放到左y轴.

Using ggplot2, I can create a histogram with a cumulative distribution curve with the following code. However, the stat_ecdf curve is scaled to the left y-axis.

library(ggplot2)
test.data <- data.frame(values = replicate(1, sample(0:10,1000, rep=TRUE)))
g <- ggplot(test.data, aes(x=values))
g + geom_bar() + 
    stat_ecdf() + 
    scale_y_continuous(sec.axis=sec_axis(trans = ~./100, name="percentage"))

这是生成的图形(您可以在底部看到ecdf):

Here is the graph generated (you can see the ecdf at the bottom):

如何将 stat_ecdf 缩放到第二个y轴?

How do I scale the stat_ecdf to the second y-axis?

推荐答案

通常,您希望将内部计算的ECDF值(累积密度)相乘,称为 .. y .. ,通过轴转换的逆函数,使其垂直范围将类似于条形图的水平范围:

In general, you want to multiply the internally calculated ECDF value (the cumulative density), which is called ..y.., by the inverse of the axis transformation, so that its vertical extent will be similar to that of the bars:

library(tidyverse)
library(scales)

set.seed(2)
test.data <- data.frame(values = replicate(1, sample(0:10,1000, rep=TRUE)))

ggplot(test.data, aes(x=values)) +
  geom_bar(fill="grey70") + 
  stat_ecdf(aes(y=..y..*100)) + 
  scale_y_continuous(sec.axis=sec_axis(trans = ~./100 , name="percentage", labels=percent)) +
  theme_bw()

由于您在11个存储桶中随机分配了1,000个值,因此事实证明这两个y比例都是10的倍数.下面是一个更通用的版本.

Because you distributed 1,000 values randomly among 11 buckets, it happened to turn out that both y-scales were multiples of 10. Below is a more general version.

此外,能够以编程方式确定转换因子会很不错,这样我们就不必在看到绘图中的条形高度后就手动选择它.为此,我们计算ggplot外部最高条形的高度,并在绘图中使用该值(以下称为 max_y ).我们还使用 pretty 函数将 max_y 重置为与最高条形关联的y轴上的最高中断值(ggplot使用 pretty 设置默认的轴断点),以便主要和次要y轴断点对齐.

In addition, it would be nice to be able to programmatically determine the transformation factor, so that we don't have to pick it by hand after seeing the bar heights in the plot. To do that, we calculate the height of the highest bar outside ggplot and use that value (called max_y below) in the plot. We also use the pretty function to reset max_y to the highest break value on the y-axis associated with the highest bar (ggplot uses pretty to set the default axis breaks), so that the primary and secondary y-axis breaks will line up.

最后,我们使用 aes _ bquote 创建一个带引号的调用,以便ggplot可以识别传递的 max_y 值.

Finally, we use aes_ and bquote to create a quoted call, so that ggplot will recognize the passed max_y value.

set.seed(2)
test.data <- data.frame(values = replicate(1, sample(0:10,768, rep=TRUE)))

max_y = max(table(test.data$values))
max_y = max(pretty(c(0,max_y)))

ggplot(test.data, aes(x=values)) +
  geom_bar(fill="grey70") + 
  stat_ecdf(aes_(y=bquote(..y.. * .(max_y)))) + 
  scale_y_continuous(sec.axis=sec_axis(trans = ~./max_y, name="percentage", labels=percent)) +
  theme_bw()

这篇关于使用不同比例的累积分布曲线(ECDF)创建ggplot2直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆