从R中用ggplot2制作的多个箱图中完全去除异常值,并以展开的格式显示箱线图 [英] Remove outliers fully from multiple boxplots made with ggplot2 in R and display the boxplots in expanded format

查看:3471
本文介绍了从R中用ggplot2制作的多个箱图中完全去除异常值,并以展开的格式显示箱线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,

  sts < -  boxplot.stats(yp $ x)$ stats 


$

  p1 = plt_wool + coord_cartesian为了消除异常值,我添加了如下的上限和下限晶须限制(ylim = c(sts * 1.05))


解决方案

基于@Sven Hohenstein的建议,@Roland和@lukeA我已经解决了在没有异常值的情况下展开多个箱型图的问题。

in geom_boxplot()

  plt_wool< ;  -  ggplot(子集(df_mlt,值> 0),aes(x = ID1,y = value))+ 
geom_boxplot(aes(color = factor(ID1)),outlier.colour = NA)+
scale_y_log10(breaks = trans_breaks(log10,function(x)10 ^ x),labels = trans_format(log10,math_format(10 ^ .x)))+
theme_bw()+
主题(legend.text = element_text(size = 14),legend.title = element_text(size = 14))+
theme(axis.text = element_text(size = 20))+
theme (axis.tit le = element_text(size = 20,face =bold))+
labs(x =x,y =y,color =legend)+
annotation_logticks(sides = (title.hjust = 0.5)+
theme(plot.margin = unit(c(0,rl))+
theme(panel.grid.minor = element_blank())+
guides然后用下面的方程计算下面的胡须,使用<= code> boxplot.stats()
作为下面的代码。由于我只考虑正值,所以我使用 subset()中的条件选择它们。

  yp < - 子集(df,x> 0)#选择col中的+ ve值
sts < - boxplot.stats(yp $ x)$ stats#计算下限和上限晶须限制

现在要实现多个箱形图的完整展开视图,修改在$ coord_cartesian()函数内的图的y轴限制如下,

  p1 = plt_wool + coord_cartesian(ylim = c(sts [2] / 2,max(sts)* 1.05))

注意:应根据具体情况调整y的限制。在这种情况下,我选择了ymin的一半下限晶须。



结果图如下,


I have some data here [in a .txt file] which I read into a data frame df,

df <- read.table("data.txt", header=T,sep="\t")

I remove the negative values in the column x (since I need only positive values) of the df using the following code,

yp <- subset(df, x>0)

Now I want plot multiple box plots in the same layer. I first melt the data frame df, and the plot which results contains several outliers as shown below.

# Melting data frame df    
df_mlt <-melt(df, id=names(df)[1])
    # plotting the boxplots
    plt_wool <- ggplot(subset(df_mlt, value > 0), aes(x=ID1,y=value)) + 
      geom_boxplot(aes(color=factor(ID1))) +
      scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +    
      theme_bw() +
      theme(legend.text=element_text(size=14), legend.title=element_text(size=14))+
      theme(axis.text=element_text(size=20)) +
      theme(axis.title=element_text(size=20,face="bold")) +
      labs(x = "x", y = "y",colour="legend" ) +
      annotation_logticks(sides = "rl") +
      theme(panel.grid.minor = element_blank()) +
      guides(title.hjust=0.5) +
      theme(plot.margin=unit(c(0,1,0,0),"mm")) 
    plt_wool

Now I need to have a plot without any outliers, so to do this first I compute the lower and upper bound whiskers I use the following code as suggested here,

sts <- boxplot.stats(yp$x)$stats

To remove the outlier I add the upper and lower whisker limits as below,

p1 = plt_wool + coord_cartesian(ylim = c(sts*1.05,sts/1.05))

The resulting plot is shown below, while the above line of code correctly removes most of the top outliers all the bottom outliers still remain. Could someone please suggest how to remove all the outlier completely from this plot, Thanks.

解决方案

Based on suggestions by @Sven Hohenstein, @Roland and @lukeA I have solved the problem for displaying multiple boxplots in expanded form without outliers.

First plot the box plots without outliers by using outlier.colour=NA in geom_boxplot()

plt_wool <- ggplot(subset(df_mlt, value > 0), aes(x=ID1,y=value)) + 
  geom_boxplot(aes(color=factor(ID1)),outlier.colour = NA) +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
  theme_bw() +
  theme(legend.text=element_text(size=14), legend.title=element_text(size=14))+
  theme(axis.text=element_text(size=20)) +
  theme(axis.title=element_text(size=20,face="bold")) +
  labs(x = "x", y = "y",colour="legend" ) +
  annotation_logticks(sides = "rl") +
  theme(panel.grid.minor = element_blank()) +
  guides(title.hjust=0.5) +
  theme(plot.margin=unit(c(0,1,0,0),"mm"))

Then compute the lower, upper whiskers using boxplot.stats() as the code below. Since I only take into account positive values, I choose them using the condition in the subset().

yp <- subset(df, x>0)             # Choosing only +ve values in col x
sts <- boxplot.stats(yp$x)$stats  # Compute lower and upper whisker limits

Now to achieve full expanded view of the multiple boxplots, it is useful to modify the y-axis limit of the plot inside coord_cartesian() function as below,

p1 = plt_wool + coord_cartesian(ylim = c(sts[2]/2,max(sts)*1.05))

Note: The limits of y should be adjusted according to the specific case. In this case I have chosen half of lower whisker limit for ymin.

The resulting plot is below,

这篇关于从R中用ggplot2制作的多个箱图中完全去除异常值,并以展开的格式显示箱线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆