指定限制时,值从ggplot2直方图中掉落 [英] Values getting dropped from ggplot2 histogram when specifying limits

查看:139
本文介绍了指定限制时,值从ggplot2直方图中掉落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个ggplot2直方图,其中图的限制等于数据集中的最小和最大值,而不会从实际直方图中排除那些值.

I'd like to create a ggplot2 histogram in which the plot's limits are equal to the smallest and largest values in the data set, without excluding those values from the actual histogram.

使用基本图形时,我得到了想要的行为.具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,在第二个直方图中未排除bin),即使我在第二个绘图中都包含了xlim自变量:

I get the behavior I'm looking for when using base graphics. Specifically, the second histogram below shows all of the same values as the first histogram (i.e., no bins are excluded in the second histogram), even though I've included an xlim argument to the second plot:

min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)

hist(mtcars$wt, breaks = 30, main = "No limits added")

hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")

ggplot2却没有给我这种行为:

ggplot2 isn't giving me this behavior though:

library(ggplot2)

# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")

p + xlim(xlim) + ggtitle("Limits added") 

看看在第二个情节中我如何失去低于2的点之一和高于5的点2?我想知道如何解决这个问题.一些杂项说明:

See how in the second plot I lose one of the points that is below 2 and 2 of the points that are above 5? I would like to know how to fix this. A few misc notes:

首先,指定boundary允许我在直方图中包括最小值(即小于2的那些值),但是对于2个大于5的正被丢弃的值,我仍然没有解决方案:

First, specifying boundary allows me to include the minimum values (i.e., those below 2) in the histogram, but I still don't have a solution to the 2 values greater than 5 that are getting dropped:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too")

第二,问题的存在取决于为bins选择的值.例如,当我将bins增加到50时,我没有得到任何下降的值:

Second, the presence of the issue is dependent on the value chosen for bins. For example, when I increase bins to be 50, I don't get any dropped values:

ggplot(mtcars, aes(x = wt)) + 
  geom_histogram(bins = 50, colour = "green", boundary = min_wt) + 
  xlim(xlim) +
  ggtitle("Limits added with boundary too, but with bins = 50")

最后,我认为此问题与此处的SO相关: geom_histogram:错误的垃圾箱? 并在此处进行讨论: https://github.com/tidyverse/ggplot2/问题/1651 .换句话说,我认为此问题与舍入错误"有关.我在关于此问题的第二篇文章(其中显示了图表)中更深入地描述了此错误:

Finally, I believe this issue is related to the one presented on SO here: geom_histogram: wrong bins? and discussed here as well: https://github.com/tidyverse/ggplot2/issues/1651. In other words, I think this issue is related to a "rounding error." I describe this error in more depth in my second post (the one with the graphs shown in it) on this issue: https://github.com/daattali/ggExtra/issues/81.

这是我的会话信息:

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] labeling_0.3      colorspace_1.3-2  scales_0.5.0.9000
 [4] compiler_3.4.2    lazyeval_0.2.1    plyr_1.8.4       
 [7] tools_3.4.2       pillar_1.2.1      gtable_0.2.0     
[10] tibble_1.4.2      yaml_2.1.16       Rcpp_0.12.15     
[13] grid_3.4.2        rlang_0.2.0.9000  munsell_0.4.3 

推荐答案

@ eipi10在注释中提到的另一种选择是更改scale_x_continuous中的oob(无界)参数.

Another option to what was mentioned by @eipi10 in the comments, is to change the oob (out of bounds) argument in scale_x_continuous.

处理超出比例尺界限(界限)的界限的功能.默认值将超出范围的值替换为NA.

Function that handles limits outside of the scale limits (out of bounds). The default replaces out of bounds values with NA.

默认使用scales::censor(),您可以将其更改为oob = scales::squish,这会将值压缩为一个范围.

The default uses scales::censor(), you can change that to be oob = scales::squish, which squishes values into a range.

比较以下两个图.

p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")

警告: 删除了1个包含缺失值的行(geom_bar).

warning: Removed 1 rows containing missing values (geom_bar).

p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")

您的第三个ggplot,您指定了边界,但仍然删除了大于5的2个值.

Your third ggplot, where you specified a boundary but still 2 values greater than 5 got dropped would look like this.

ggplot(mtcars, aes(x = wt)) + 
 geom_histogram(bins = 30, colour = "green", boundary = min_wt) + 
 scale_x_continuous(limits = xlim, oob = scales::squish) +
 ggtitle("Limits added with boundary too") +
 labs(subtitle = "scales::squish")

希望这会有所帮助.

这篇关于指定限制时,值从ggplot2直方图中掉落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆