R - 使用 ggplot2 模拟 hist() 的默认行为以获得 bin 宽度 [英] R - emulate the default behavior of hist() with ggplot2 for bin width

查看:15
本文介绍了R - 使用 ggplot2 模拟 hist() 的默认行为以获得 bin 宽度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ggplot2 为一个变量绘制直方图.不幸的是,ggplot2 的默认 binwidth 有一些不足之处:

I'm trying to plot an histogram for one variable with ggplot2. Unfortunately, the default binwidth of ggplot2 leaves something to be desired:

我尝试使用 binwidth,但我无法摆脱那个丑陋的空"bin:

I've tried to play with binwidth, but I am unable to get rid of that ugly "empty" bin:

有趣的是(对我而言),R 的默认 hist() 函数似乎对垃圾箱产生了更好的分割":

Amusingly (to me), the default hist() function of R seems to produce a much better "segmentation" of the bins:

由于我正在使用 ggplot2 绘制所有其他图表,因此我也想将它用于这个图表 - 以保持一致性.如何使用 ggplot2 生成 hist() 函数的相同 bin分段"?

Since I'm doing all my other graphs with ggplot2, I'd like to use it for this one as well - for consistency. How can I produce the same bin "segmentation" of the hist() function with ggplot2?

我试图在终端输入hist,但我只得到

I tried to input hist at the terminal, but I only got

function (x, ...) 
UseMethod("hist")
<bytecode: 0x2f44940>
<environment: namespace:graphics>

没有关于我的问题的信息.

which bears no information for my problem.

我使用以下代码在 ggplot2 中生成直方图:

I am producing my histograms in ggplot2 with the following code:

ggplot(mydata, aes(x=myvariable)) + geom_histogram(color="darkgray",fill="white", binwidth=61378) + scale_x_continuous("My variable") + scale_y_continuous("Subjects",breaks=c(0,2.5,5,7.5,10,12.5),limits=c(0,12.5)) + theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

我应该补充的一件事是,查看由hist() 生成的直方图,看起来 bin 的宽度为 50000(例如,从 1400000 到 1600000 正好有两个 bin);在 ggplot2 中将 binwidth 设置为 50000 不会产生相同的图形.ggplot2 生成的图也有同样的差距.

One thing I should add is that looking at the histogram produced byhist(), it would seem that the bins have a width of 50000 (e.g. from 1400000 to 1600000 there are exactly two bins); setting binwidth to 50000 in ggplot2 does not produce the same graph. The graph produced by ggplot2 has the same gap.

推荐答案

没有样本数据,总是很难得到可重现的结果,所以我创建了一个样本数据集

Without sample data, it's always difficult to get reproducible results, so i've created a sample dataset

set.seed(16)
mydata <- data.frame(myvariable=rnorm(500, 1500000, 10000))

#base histogram
hist(mydata$myvariable)

正如您所了解的,hist() 是一个通用函数.如果您想查看不同的实现,您可以键入 methods(hist).大多数情况下,您将运行 hist.default.因此,如果从该函数中借用中断查找逻辑,我们会得出

As you've learned, hist() is a generic function. If you want to see the different implementations you can type methods(hist). Most of the time you'll be running hist.default. So if be borrow the break finding logic from that funciton, we come up with

brx <- pretty(range(mydata$myvariable), 
    n = nclass.Sturges(mydata$myvariable),min.n = 1)

这就是 hist() 默认计算中断的方式.然后我们可以使用 ggplot 命令

which is how hist() by default calculates the breaks. We can then use these breaks with the ggplot command

ggplot(mydata, aes(x=myvariable)) + 
    geom_histogram(color="darkgray",fill="white", breaks=brx) + 
    scale_x_continuous("My variable") + 
    theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

下图并排显示了两个结果,您可以看到它们非常相似.

and the plot below shows the two results side-by-side and as you can see they are quite similar.

此外,那个空的 bim 可能是由您的 y 轴限制引起的.如果一个形状超出了您在 scale_y_continuous 中指定的范围的限制,它将简单地从图中删除.看起来那个 bin 想要 14 高,但你把 y 剪成 12.5.

Also, that empty bim was probably caused by your y-axis limits. If a shape goes outside the limits of the range you specify in scale_y_continuous, it will simply get dropped from the plot. It looks like that bin wanted to be 14 tall, but you clipped y at 12.5.

这篇关于R - 使用 ggplot2 模拟 hist() 的默认行为以获得 bin 宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆