R - 使用ggplot2模拟bin宽度的缺省行为 [英] R - emulate the default behavior of hist() with ggplot2 for bin width

查看:119
本文介绍了R - 使用ggplot2模拟bin宽度的缺省行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用ggplot2绘制一个变量的直方图。不幸的是,ggplot2的默认binwidth留下了一些不足之处:



我试着玩 binwidth ,但我无法摆脱那个丑陋的空bin:



有趣的是(对我而言),R的默认 hist()函数似乎产生了一个更好的分段:



由于我使用ggplot2来完成所有其他图形,因此我希望将其用于此目的,以保持一致性。我怎样才能用ggplot2来生成 hist()函数的bin分段?



我试图在终端输入 hist ,但我只有

 函数(x,...)
UseMethod(hist)
<字节码:0x2f44940>
< environment:namespace:graphics>

对我的问题没有任何信息。



<使用以下代码在ggplot2中生成我的直方图:

  ggplot(mydata,aes(x = myvariable))+ geom_histogram(color =darkgray,fill =white,binwidth = 61378)+ scale_x_continuous(我的变量)+ scale_y_continuous(主题,休息= c(0,2.5,5,7.5,10,12.5),限制= c(0,12.5))+主题(axis.text = element_text(size = 14),axis.title = element_text(size = 16,face =bold))

我应该添加的一件事是看看由 hist()生成的直方图,它看起来箱子的宽度为50000(例如从1400000到1600000有两个箱子);在ggplot2中设置binwidth为50000 不会产生相同的图形。由ggplot2产生的图具有相同的差距。

解决方案

没有样本数据,获得可重复的结果总是很困难,所以我创建了一个样本数据集。 b
$ b

  set.seed(16)
mydata< - data.frame(myvariable = rnorm(500,1500000,10000))

#base histogram
hist(mydata $ myvariable)

据悉, hist()是一个通用函数。如果你想看到不同的实现,你可以输入 methods(hist)。大多数时候你会运行 hist.default 。因此,如果借用该功能的中断查找逻辑,我们就可以得出

  brx<  -  pretty(range(mydata $ 
n = nclass.Sturges(mydata $ myvariable),min.n = 1)

默认情况下, hist()是如何计算中断的。然后我们可以用 ggplot 命令来使用这些分支。

  ggplot(mydata ,aes(x = myvariable))+ 
geom_histogram(color =darkgray,fill =white,breaks = brx)+
scale_x_continuous(My variable)+
theme axis.text = element_text(size = 14),axis.title = element_text(size = 16,face =bold))

,下面的图显示了两个结果并排,你可以看到它们非常相似。



此外,空的bim可能是由您的y轴限制。如果一个形状超出了您在 scale_y_continuous 中指定范围的范围,它将简单地从该图中删除。看起来彬彬有礼要高14岁,但你把薪酬降到12.5。


I'm trying to plot an histogram for one variable with ggplot2. Unfortunately, the default binwidth of ggplot2 leaves something to be desired:

I've tried to play with binwidth, but I am unable to get rid of that ugly "empty" bin:

Amusingly (to me), the default hist() function of R seems to produce a much better "segmentation" of the bins:

Since I'm doing all my other graphs with ggplot2, I'd like to use it for this one as well - for consistency. How can I produce the same bin "segmentation" of the hist() function with ggplot2?

I tried to input hist at the terminal, but I only got

function (x, ...) 
UseMethod("hist")
<bytecode: 0x2f44940>
<environment: namespace:graphics>

which bears no information for my problem.

I am producing my histograms in ggplot2 with the following code:

ggplot(mydata, aes(x=myvariable)) + geom_histogram(color="darkgray",fill="white", binwidth=61378) + scale_x_continuous("My variable") + scale_y_continuous("Subjects",breaks=c(0,2.5,5,7.5,10,12.5),limits=c(0,12.5)) + theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

One thing I should add is that looking at the histogram produced byhist(), it would seem that the bins have a width of 50000 (e.g. from 1400000 to 1600000 there are exactly two bins); setting binwidth to 50000 in ggplot2 does not produce the same graph. The graph produced by ggplot2 has the same gap.

解决方案

Without sample data, it's always difficult to get reproducible results, so i've created a sample dataset

set.seed(16)
mydata <- data.frame(myvariable=rnorm(500, 1500000, 10000))

#base histogram
hist(mydata$myvariable)

As you've learned, hist() is a generic function. If you want to see the different implementations you can type methods(hist). Most of the time you'll be running hist.default. So if be borrow the break finding logic from that funciton, we come up with

brx <- pretty(range(mydata$myvariable), 
    n = nclass.Sturges(mydata$myvariable),min.n = 1)

which is how hist() by default calculates the breaks. We can then use these breaks with the ggplot command

ggplot(mydata, aes(x=myvariable)) + 
    geom_histogram(color="darkgray",fill="white", breaks=brx) + 
    scale_x_continuous("My variable") + 
    theme(axis.text=element_text(size=14),axis.title=element_text(size=16,face="bold"))

and the plot below shows the two results side-by-side and as you can see they are quite similar.

Also, that empty bim was probably caused by your y-axis limits. If a shape goes outside the limits of the range you specify in scale_y_continuous, it will simply get dropped from the plot. It looks like that bin wanted to be 14 tall, but you clipped y at 12.5.

这篇关于R - 使用ggplot2模拟bin宽度的缺省行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆