ggplot的scale_y_log10行为 [英] ggplot's scale_y_log10 behavior

查看:737
本文介绍了ggplot的scale_y_log10行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试使用ggplot绘制堆叠的直方图:

Trying to plot a stacked histogram using ggplot:

set.seed(1)
my.df <- data.frame(param = runif(10000,0,1), 
                    x = runif(10000,0.5,1))
my.df$param.range <- cut(my.df$param, breaks = 5)

require(ggplot2)

不记录y轴:

ggplot(my.df,aes_string(x = "x", fill = "param.range")) + 
    geom_histogram(binwidth = 0.1, pad = TRUE) + 
    scale_fill_grey()

给出:

但是我想对y轴进行log10 + 1变换以使其更易于阅读:

But I want to log10+1 transform the y-axis to make it easier to read:

ggplot(my.df, aes_string(x = "x", y = "..count..+1", fill = "param.range")) + 
    geom_histogram(binwidth = 0.1, pad = TRUE) + 
    scale_fill_grey() + 
    scale_y_log10()

给出:

y轴上的刻度线没有意义.

The tick marks on the y-axis don't make sense.

如果我进行log10转换而不是log10 + 1,则会得到相同的行为:

I get the same behavior if I log10 transform rather than log10+1:

ggplot(my.df, aes_string(x = "x", fill = "param.range")) + 
    geom_histogram(binwidth = 0.1, pad = TRUE) + 
    scale_fill_grey() + 
    scale_y_log10()

知道发生了什么吗?

推荐答案

使用堆叠的直方图调用scale_y_log10似乎导致ggplot绘制出每个元素的计数的 product .堆叠在每个x bin中.下面是一个演示.我们在每个param.range bin的计数的每个x bin中创建一个名为product.of.counts的数据框,其中包含产品.我们使用geom_text将这些值添加到绘图中,并看到它们与直方图条形图的每个堆栈的顶部重合.

It looks like invoking scale_y_log10 with a stacked histogram is causing ggplot to plot the product of the counts for each component of the stack within each x bin. Below is a demonstration. We create a data frame called product.of.counts that contains the product, within each x bin of the counts for each param.range bin. We use geom_text to add those values to the plot and see that they coincide with the top of each stack of histogram bars.

起初我以为这是一个错误,但是经过一些搜索之后,我想起了ggplot的方式执行日志转换.如链接的答案中所述,"scale_y_log10进行计数,将其转换为日志,堆叠这些日志,然后以反对数形式显示比例.但是,堆叠日志不是线性变换,因此您将要求这样做没有任何意义."

At first I thought this was a bug, but after a bit of searching, I was reminded of the way ggplot does the log transformation. As described in the linked answer, "scale_y_log10 makes the counts, converts them to logs, stacks those logs, and then displays the scale in the anti-log form. Stacking logs, however, is not a linear transformation, so what you have asked it to do does not make any sense."

举一个简单的例子,说一个堆叠条形图的五个组成部分中的每个组成部分的计数为100.然后,所有五个组成部分的log10(100)= 2,并且对数的总和为10.然后ggplot取反对数比例尺,即使实际高度为100x5 = 500,也可以为条形图的总高度提供10 ^ 10(即100 ^ 5).这正是您的情节正在发生的事情.

As a simpler example, say each of five components of a stacked bar have a count of 100. Then log10(100) = 2 for all five and the sum of the logs will be 10. Then ggplot takes the anti-log for the scale, which gives 10^10 for the total height of the bar (which is 100^5), even though the actual height is 100x5=500. This is exactly what's happening with your plot.

library(dplyr)
library(ggplot2)

# Data
set.seed(1)
my.df <- data.frame(param=runif(10000,0,1),x=runif(10000,0.5,1))
my.df$param.range <- cut(my.df$param,breaks=5)

# Calculate product of counts within each x bin
product.of.counts = my.df %>% 
  group_by(param.range, breaks=cut(x, breaks=seq(-0.05, 1.05, 0.1), labels=seq(0,1,0.1))) %>%
  tally %>%
  group_by(breaks) %>% 
  summarise(prod = prod(n),
            param.range=NA) %>%
  ungroup %>%
  mutate(breaks = as.numeric(as.character(breaks)))

ggplot(my.df, aes(x, fill=param.range)) + 
  geom_histogram(binwidth = 0.1, colour="grey30") + 
  scale_fill_grey() + 
  scale_y_log10(breaks=10^(0:14)) +
  geom_text(data=product.of.counts, size=3.5, 
            aes(x=breaks, y=prod, label=format(prod, scientific=TRUE, digits=3)))

这篇关于ggplot的scale_y_log10行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆