强制从geom_boxplot的箱形图变为不变的宽度 [英] force boxplots from geom_boxplot to constant width

查看:593
本文介绍了强制从geom_boxplot的箱形图变为不变的宽度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我制作了一个boxplot,其中<​​code> x 和 fill 映射到不同的变量,有点像这样:

  ggplot(mpg,aes(x = as.factor(cyl),y = cty,fill = as.factor drv)))+ 
geom_boxplot()


正如上面的例子,我的盒子的宽度是不同的在 x 值不同,因为我没有 x fill 值,所以。



我希望所有的框都是相同的宽度。这是否可以完成(理想情况下,没有操纵底层数据框,因为我担心在进一步分析过程中添加假数据会引起混淆)?我的第一个想法是 p>

  + geom_boxplot(width = 0.5)

但这没有帮助;它调整给定 x 因子水平的全套箱图的宽度。

这篇文章 几乎似乎相关,但我不太明白如何将其应用于我的情况。使用 + scale_fill_discrete(drop = FALSE)似乎并没有改变条的宽度。

解决方案

问题是由于一些因子组合的细胞不存在。可以通过 cyl 和 drv 所有级别组合的数据点数量> xtabs :

 标签< -  xtabs(〜drv + cyl,mpg)

标签

#cyl
#drv 4 5 6 8
#4 23 0 32 48
#f 58 4 43 1
#r 0 0 4 21

有三个空单元格。我会添加假数据来覆盖可视化问题。



检查因变量的范围(y轴)。假数据需要超出此范围。

 范围(mpg $ cty)
#[1] 9 35

创建 mpg 的子集数据所需的数据:

  tmp < -  mpg [c(cyl,drv,cty) ] 

为空单元格创建索引:

  idx<  -  which(tab == 0,arr.ind = TRUE)

idx

#row col
#r 3 1
#4 1 2
#r 3 2

创建三条假行(以-1作为 cty 的值):

  fakeLines<  -  apply(idx,1,
function(x)
setNames(data.frame(as.integer(dimnames(tab)[[2]] [x [2]] ),
dimnames(tab)[[1]] [x [1]],
-1),
名称(tmp)))

fakeLines

#$ r
# cyl drv cty
#1 4 r -1

#$`4`
#cyl drv cty
#1 5 4 -1

#$ r
#cyl drv cty
#1 5 r -1

将行添加到现有数据中:

  tmp2 < -  rbind(tmp,do.call(rbind, )
$ / code>

Plot:

<$ p $ ($)
ggplot(tmp2,aes(x = as.factor(cyl),y = cty,fill = as.factor(drv)))+
geom_boxplot()+
coord_cartesian(ylim = c(min(tmp $ cty - 3),max(tmp $ cty)+ 3))
#必须更改轴限制以禁止显示假数据。


I'm making a boxplot in which x and fill are mapped to different variables, a bit like this:

ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) + 
    geom_boxplot()

As in the example above, the widths of my boxes come out differently at different x values, because I do not have all possible combinations of x and fill values, so .

I would like for all the boxes to be the same width. Can this be done (ideally without manipulating the underlying data frame, because I fear that adding fake data will cause me confusion during further analysis)?

My first thought was

+ geom_boxplot(width=0.5)

but this doesn't help; it adjusts the width of the full set of boxplots for a given x factor level.

This post almost seems relevant, but I don't quite see how to apply it to my situation. Using + scale_fill_discrete(drop=FALSE) doesn't seem to change the widths of the bars.

解决方案

The problem is due to some cells of factor combinations being not present. The number of data points for all combinations of the levels of cyl and drv can be checked via xtabs:

tab <- xtabs( ~ drv + cyl, mpg)

tab

#    cyl
# drv  4  5  6  8
#   4 23  0 32 48
#   f 58  4 43  1
#   r  0  0  4 21

There are three empty cells. I will add fake data to override the visualization problems.

Check the range of the dependent variable (y-axis). The fake data needs to be out of this range.

range(mpg$cty)
# [1]  9 35

Create a subset of mpg with the data needed for the plot:

tmp <- mpg[c("cyl", "drv", "cty")]

Create an index for the empty cells:

idx <- which(tab == 0, arr.ind = TRUE)

idx

#   row col
# r   3   1
# 4   1   2
# r   3   2

Create three fake lines (with -1 as value for cty):

fakeLines <- apply(idx, 1,
                   function(x) 
                     setNames(data.frame(as.integer(dimnames(tab)[[2]][x[2]]), 
                                         dimnames(tab)[[1]][x[1]], 
                                         -1), 
                              names(tmp)))

fakeLines

# $r
#   cyl drv cty
# 1   4   r  -1
# 
# $`4`
#   cyl drv cty
# 1   5   4  -1
# 
# $r
#   cyl drv cty
# 1   5   r  -1

Add the rows to the existing data:

tmp2 <- rbind(tmp, do.call(rbind, fakeLines))

Plot:

library(ggplot2)
ggplot(tmp2, aes(x = as.factor(cyl), y = cty, fill = as.factor(drv))) + 
  geom_boxplot() +
  coord_cartesian(ylim = c(min(tmp$cty - 3), max(tmp$cty) + 3))
  # The axis limits have to be changed to suppress displaying the fake data.

这篇关于强制从geom_boxplot的箱形图变为不变的宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆