ggplot boxplot - 具有对数轴的晶须的长度 [英] ggplot boxplot - length of whiskers with logarithmic axis

查看:322
本文介绍了ggplot boxplot - 具有对数轴的晶须的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ggplot2创建一个带有对数坐标轴的水平箱形图。但是,胡须的长度是错误的。



一个最小可重现的例子:

有些数据

  library(ggplot2)
library(reshape2)
set.seed(1234)
my.df< ; - data.frame(a = rnorm(1000,150,50),b = rnorm(1000,500,150))
my.df $ a [which(my.df $ a< 5)]< - 5
my.df $ b [which(my.df $ b <5)] < - 5

如果我使用基本R boxplot()来绘制这个图,那么一切都很好

  boxplot(my.df,log =x,horizo​​ntal = T)



但是使用ggplot,



<$ p $ (my.df,value.name =vals)
ggplot(my.df.long,aes(x = variable,y = VALUE = c(5,1000))+ $($) b $ b theme_bw()+ coord_flip()

我得到这个情节,其中胡须是不正确的长度(例如,请参阅胡须下面有多个额外的异常值,以及上面没有的额外异常值。)



请注意,如果没有记录坐标轴,ggplot会有胡须长度正确。

  ggplot(my.df.long,aes(x = variable,y = vals))+ 
geom_boxplot()+
theme_bw()+ coord_flip()



我如何生成使用正确长度胡须ggplot的水平对数boxplot?最好随着晶须延伸至IQR的1.5倍。

更新





关于使用 coord_trans 作为替代方法:

使用 coord_trans(y =log10)来代替 $ b> code> scale_y_log10 ,会导致未转换数据上的统计数据被正确计算。 然而 coord_trans 不能与 coord_flip 组合使用。所以,这并不能解决用日志轴创建水平箱形图的问题。建议此处使用 ggdraw(switch_axis_position ())在使用 coord_trans 后无法正常工作,但会抛出错误(cowplot v0.4.0和ggplot2 v2 (gyl $ x,grid :: unit(0.5,npc)):两个操作数
必须为单位

另外:警告信息: axis.ticks.margin
已弃用。请设置保证金属性 axis.text 改为


I'm trying to create a horizontal boxplot with logarithmic axis using ggplot2. But, the length of whiskers are wrong.

A minimal reproducible example:

Some data

library(ggplot2)
library(reshape2)
set.seed(1234)
my.df <- data.frame(a = rnorm(1000,150,50), b = rnorm(1000,500,150))
my.df$a[which(my.df$a < 5)] <- 5
my.df$b[which(my.df$b < 5)] <- 5

If I plot this using base R boxplot(), everything is fine

boxplot(my.df, log="x", horizontal=T)

But with ggplot,

my.df.long <- melt(my.df, value.name = "vals")
ggplot(my.df.long, aes(x=variable, y=vals)) +
  geom_boxplot() +
  scale_y_log10(breaks=c(5,10,20,50,100,200,500,1000), limits=c(5,1000)) +
  theme_bw() + coord_flip()

I get this plot, in which the whiskers are the wrong length (see for example how there are many additional outliers below the whiskers and none above).

Note that, without log axes, ggplot has the whiskers the correct length

ggplot(my.df.long, aes(x=variable, y=vals)) +
  geom_boxplot() +
  theme_bw() + coord_flip()

How do I produce a horizontal logarithmic boxplot using ggplot with the correct length whiskers? Preferably with the whiskers extending to 1.5 times the IQR.

Update

As explained here. It is possible to use coord_trans(y = "log10") instead of scale_y_log10, which will cause the stats to be calculated before transforming the data. However, coord_trans cannot be used in combination with coord_flip. So this does not solve the issue of creating horizontal boxplots with a log axis.

解决方案

The problem is due to the fact that scale_y_log10 transforms the data before calculating the stats. This does not matter for the median and percentile points, because e.g. 10^log10(median) is still the median value, which will be plotted in the correct location. But it does matter for the whiskers which are calculated using 1.5*IQR, because 10^(1.5*IQR(log10(x)) is not equal to 1.5*IQR(x). So the calculation fails for the whiskers.

This error becomes evident if we compare

boxplot.stats(my.df$b)$stats
# [1] 117.4978 407.3983 502.0460 601.2937 873.0992
10^boxplot.stats(log10(my.df$b))$stats
# [1] 231.1603 407.3983 502.0459 601.2935 975.1906

In which we see that the median and percentile ppoints are identical, but the whisker ends (1st and last elements of the stats vector) differ

This detailed and useful answer by @eipi10, shows how to calculate the stats yourself and force ggplot to use these user-defined stats rather than its internal (and incorrect) algorithm. Using this approach, it becomes relatively simple to calculate the correct statistics and use these instead.

# Function to use boxplot.stats to set the box-and-whisker locations  
mybxp = function(x) {
  bxp = log10(boxplot.stats(10^x)[["stats"]])
  names(bxp) = c("ymin","lower", "middle","upper","ymax")
  return(bxp)
}  

# Function to use boxplot.stats for the outliers
myout = function(x) {
  data.frame(y=log10(boxplot.stats(10^x)[["out"]]))
}

ggplot(my.df.long, aes(x=variable, y=vals)) + theme_bw() + coord_flip() +
  scale_y_log10(breaks=c(5,10,20,50,100,200,500,1000), limits=c(5,1000)) + 
  stat_summary(fun.data=mybxp, geom="boxplot") +
  stat_summary(fun.data=myout, geom="point") 

Which produces the correct plot

A note on using coord_trans as an alternative approach:

Using coord_trans(y = "log10") instead of scale_y_log10, causes the stats to be calculated (correctly) on the untransformed data. However, coord_trans cannot be used in combination with coord_flip. So, this does not solve the issue of creating horizontal boxplots with a log axis. The suggestion here to use ggdraw(switch_axis_position()) from the cowplot package to flip the axes after using coord_trans did not work, but throws an error (cowplot v0.4.0 with ggplot2 v2.1.0)

Error in Ops.unit(gyl$x, grid::unit(0.5, "npc")) : both operands must be units

In addition: Warning message: axis.ticks.margin is deprecated. Please set margin property of axis.text instead

这篇关于ggplot boxplot - 具有对数轴的晶须的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆