`geom_histogram`和`stat_bin()`不对齐 [英] `geom_histogram` and `stat_bin()` don't align

查看:85
本文介绍了`geom_histogram`和`stat_bin()`不对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

构建直方图后,我想在绘图中添加上边界/轮廓.我不想使用 geom_bar geom_col ,因为我不想每个容器的垂直边界.
我的尝试包括使用 geom_histogram stat_bin(geom ="bin"),但是垃圾箱无法对齐.

我已经调整了每个几何区域内的参数( bins binwidth center boundary )和避风港无法调整这些分布.在SO上也有类似的问题(

  p + stat_bin(geom ="step") 

我希望这两个几何图形对齐的图.我已经测试了各种虚拟数据,但这仍然是一个问题.这些几何体为什么不自然对齐?如何调整这些层中的任何一层以对齐?有没有比结合直方图和统计盒更好的替代方案来实现我想要的图了?

解决方案

条形图不能自然对齐,因为geom_step似乎使用了每个直方图条形图的中间( layer_data(p))返回的数据帧作为每个更改点的位置.因此,要对齐步骤,请使用position_nudge将geom_step移到binwidth的一半:

 库(tidyverse)p< df%>%ggplot(aes(x,fill = y))+geom_histogram(bins = 20)+facet_wrap(vars(y))+theme_fivethirtyeight()+指南(填充= F)binwidth = layer_data(p)%>%突变(w = xmax-xmin)%>%pull(w)%>%中位数p + stat_bin(geom ="step",binwidth = binwidth,position = position_nudge(x = -0.5 * binwidth)) 

但是,请注意,在上图中,步骤边框停在左侧面板中最后一个栏的中间,并且未限制右侧面板中第一个栏的左边缘.下面是一种使 geom_step 完全绑定所有直方图条的技巧.

我们在真实数据范围之外添加两行伪数据,然后将绘图的x范围设置为仅包含真实数据范围.在这种情况下,我设置了 binwidth (而不是bin的数量),因为扩展数据范围将增加任何固定数量的bin的binwidth,并且还添加了 center 参数,它不是必需的,但可用于确保垃圾箱在特定位置居中(如果需要).

如果这是您经常要执行的操作,则可以将其转变为具有某种逻辑的函数,以使用假数据自动扩展数据框,并适当设置图的bin和x范围.

  p<-df%&%;%add_row(x = range(df $ x)+ c(-1,1),y ="a")%>%ggplot(aes(x,fill = y))+geom_histogram(binwidth = 0.2,center = 0)+facet_wrap(vars(y))+theme_fivethirtyeight()+指南(填充= F)binwidth = layer_data(p)%>%mutate(xmax-xmin)%>%pull()%>%中位数p +stat_bin(geom ="step",binwidth = binwidth,position = position_nudge(x = -0.5 * binwidth))+coord_cartesian(xlim = range(df $ x [1:(nrow(df)-2)])+ c(-0.2,0.2)) 

这是没有多余行黑客的情况下相同的情节的样子:

  p<-df%&%;%ggplot(aes(x,fill = y))+geom_histogram(binwidth = 0.2,center = 0)+facet_wrap(vars(y))+theme_fivethirtyeight()+指南(填充= F)binwidth = layer_data(p)%>%mutate(xmax-xmin)%>%pull()%>%中位数p +stat_bin(geom ="step",binwidth = binwidth,position = position_nudge(x = -0.5 * binwidth)) 

After constructing a histogram I'd like to add an upper boundary/outline to my plot. I don't want to use geom_bar or geom_col because I don't want the vertical boundaries for each bin.
My attempts have included using geom_histogram and stat_bin(geom = "bin"), however the bins don't align.

I've adjusted parameters within each geom (bins, binwidth, center, boundary) and haven't been able to align these distributions. There have been similar questions on SO (Overlaying geom_points on a geom_histogram or stat_bin) but none seem to have a similar problem to mine or offer a solution.

Here is a case where my geom layers don't align:

set.seed(2019)
library(ggplot2)
library(ggthemes)
df <- data.frame(x = rnorm(100), 
                 y = rep(c("a", "b"), 50))

p <- df %>% 
    ggplot(aes(x, fill = y)) + 
    geom_histogram() + 
    facet_wrap(vars(y)) + 
    theme_fivethirtyeight() + 
    guides(fill = F)

This is plot p, my base histogram:

p + stat_bin(geom = "step")

I desire a plot where these two geoms align. I've tested a variety of dummy data and this continues to be an issue. Why don't these geoms naturally align? How can I adjust either of these layers to align? Is there a better alternative than combining histogram and stat bin to achieve my desired plot?

解决方案

The bars don't naturally align, because geom_step appears to be using the middle of each histogram bar (the x column in the data frame returned by layer_data(p)) as the location for each change point. Thus, to align the steps, use position_nudge to move geom_step by half the binwidth:

library(tidyverse)

p <- df %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(bins=20) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(w=xmax-xmin) %>% pull(w) %>% median

p + stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))

Note, however, in the plot above that the step border stops in the middle of the last bar in the left panel, and doesn't bound the left edge of first bar in the right panel. Below is a hack to get geom_step to completely bound all the histgram bars.

We add two rows of fake data outside the range of the real data, then we set the x-range of the plot to include only the range of the real data. In this case, I've set the binwidth (rather than the number of bins) because extending the data range will increase the binwidth for any fixed number of bins, and also added a center argument, which isn't necessary, but can be used to ensure that the bins are centered at particular locations, if desired.

If this is something you want to do often, you can turn this into a function with some logic to automate expanding of the data frame with fake data and setting the bins and the x-range of the plot appropriately.

p <- df %>% 
  add_row(x=range(df$x) + c(-1,1), y="a") %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(binwidth=0.2, center=0) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median

p + 
  stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth)) +
  coord_cartesian(xlim=range(df$x[1:(nrow(df)-2)]) + c(-0.2,0.2))

Here's what the same plot looks like without the extra-rows hack:

p <- df %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(binwidth=0.2, center=0) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median

p + 
  stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))

这篇关于`geom_histogram`和`stat_bin()`不对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆