指定限制时,值从ggplot2直方图中掉落 [英] Values getting dropped from ggplot2 histogram when specifying limits
问题描述
我想创建一个ggplot2直方图,其中图的限制等于数据集中的最小和最大值,而不会从实际直方图中排除那些值.
I'd like to create a ggplot2 histogram in which the plot's limits are equal to the smallest and largest values in the data set, without excluding those values from the actual histogram.
使用基本图形时,我得到了想要的行为.具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,在第二个直方图中未排除bin),即使我在第二个绘图中都包含了xlim
自变量:
I get the behavior I'm looking for when using base graphics. Specifically, the second histogram below shows all of the same values as the first histogram (i.e., no bins are excluded in the second histogram), even though I've included an xlim
argument to the second plot:
min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)
hist(mtcars$wt, breaks = 30, main = "No limits added")
hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")
ggplot2却没有给我这种行为:
ggplot2 isn't giving me this behavior though:
library(ggplot2)
# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")
p + xlim(xlim) + ggtitle("Limits added")
看看在第二个情节中我如何失去低于2的点之一和高于5的点2?我想知道如何解决这个问题.一些杂项说明:
See how in the second plot I lose one of the points that is below 2 and 2 of the points that are above 5? I would like to know how to fix this. A few misc notes:
首先,指定boundary
允许我在直方图中包括最小值(即小于2的那些值),但是对于2个大于5的正被丢弃的值,我仍然没有解决方案:>
First, specifying boundary
allows me to include the minimum values (i.e., those below 2) in the histogram, but I still don't have a solution to the 2 values greater than 5 that are getting dropped:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too")
第二,问题的存在取决于为bins
选择的值.例如,当我将bins
增加到50时,我没有得到任何下降的值:
Second, the presence of the issue is dependent on the value chosen for bins
. For example, when I increase bins
to be 50, I don't get any dropped values:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 50, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too, but with bins = 50")
最后,我认为此问题与此处的SO相关: geom_histogram:错误的垃圾箱? 并在此处进行讨论: https://github.com/tidyverse/ggplot2/问题/1651 .换句话说,我认为此问题与舍入错误"有关.我在关于此问题的第二篇文章(其中显示了图表)中更深入地描述了此错误:
Finally, I believe this issue is related to the one presented on SO here: geom_histogram: wrong bins? and discussed here as well: https://github.com/tidyverse/ggplot2/issues/1651. In other words, I think this issue is related to a "rounding error." I describe this error in more depth in my second post (the one with the graphs shown in it) on this issue: https://github.com/daattali/ggExtra/issues/81.
这是我的会话信息:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000
[4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4
[7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0
[10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15
[13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3
推荐答案
@ eipi10在注释中提到的另一种选择是更改scale_x_continuous
中的oob
(无界)参数.>
Another option to what was mentioned by @eipi10 in the comments, is to change the oob
(out of bounds) argument in scale_x_continuous
.
处理超出比例尺界限(界限)的界限的功能.默认值将超出范围的值替换为NA.
Function that handles limits outside of the scale limits (out of bounds). The default replaces out of bounds values with NA.
默认使用scales::censor()
,您可以将其更改为oob = scales::squish
,这会将值压缩为一个范围.
The default uses scales::censor()
, you can change that to be oob = scales::squish
, which squishes values into a range.
比较以下两个图.
p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")
警告: 删除了1个包含缺失值的行(geom_bar).
warning: Removed 1 rows containing missing values (geom_bar).
p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")
您的第三个ggplot
,您指定了边界,但仍然删除了大于5的2个值.
Your third ggplot
, where you specified a boundary but still 2 values greater than 5 got dropped would look like this.
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
scale_x_continuous(limits = xlim, oob = scales::squish) +
ggtitle("Limits added with boundary too") +
labs(subtitle = "scales::squish")
希望这会有所帮助.
这篇关于指定限制时,值从ggplot2直方图中掉落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!