在每个类别的x轴内覆盖geom_line-ggplot2 [英] Overlay geom_line within a categorical x axis for each group - ggplot2

查看:34
本文介绍了在每个类别的x轴内覆盖geom_line-ggplot2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想绘制一个这样的情节:

I want to make a plot like this:

这些框代表连续变量在组内的分布;红色圆圈是表示所有实际观察值的点.到现在为止还挺好.使用具有整体美感的 geom_boxplot + geom_point ,这将很简单.

The boxes would represent the distribution of a continuous variable within groups; the red circles are the points showing all of the actual observations. So far, so good. This would be simple with geom_boxplot + geom_point with a group aesthetic.

这是两个转折点:

  1. 这些点的水平位置不是随机抖动.相反,它们是利用连续X轴而不是分类轴的X-Y坐标
  2. 该线是适合这些点的趋势线.
  1. The horizontal position of the points are not a random jitter. They are instead an X-Y coordinate utilizing a continuous X axis instead of a categorical axis
  2. The line is a trendline that is fit on those points.

某些情况:该图显示了产品的使用情况(Y轴)与允许的使用情况(X).X轴组是互斥的,离散的层,在本质上是使用的无限连续变量.例如EG,1-4、5-9、10-20等.从视觉的角度来看,绘制这些组中的连续图像对我来说并不疯狂,这有意义吗?但是我不知道如何开始让 ggplot2 同意我的想法.

Some context: This plot is showing usage of a product (Y axis) vs allowed usage (X). The X axis groups are mutually exclusive, discrete tiers on what is essentially an infinite, continuous variable for usage. EG, 1-4, 5-9, 10-20, etc. It doesn't feel crazy to me from a visual standpoint to plot the continuous within those groups, does that make sense? But I have no idea how I'd get started on getting ggplot2 to agree with me.

我的偏好是使箱形图在X轴上均匀分布,但是如果我需要从连续的轴开始,并且各组在X轴上占据比例的空间,那么我会安定下来为此(可能使用了记录轴,以防止将下部狭窄的组完全弄脏.

这应该作为示例数据:


df <- structure(list(usage = c(1L, 4L, 2L, 5L, 4L, 1L, 2L, 98L, 9L, 
                               4L, 6L, 6L, 1L, 2L, 2L, 2L, 3L, 2L, 5L, 1L), allowed = c(2, 20, 
                                                                                        3, 3, 5, 5, 1, 1, 1, 5, 10, 5, 7, 12, 2, 5, 23, 10, 5, 2), id = c(1055L, 
                                                                                                                                                          2155L, 6637L, 11068L, 2070L, 8524L, 9157L, 5963L, 7593L, 3470L, 
                                                                                                                                                          3557L, 7469L, 9142L, 408L, 9446L, 1552L, 4788L, 7233L, 8464L, 
                                                                                                                                                          2188L), group = c("A", "B", "A", "A", "A", "A", "A", "A", "A", 
                                                                                                                                                                            "A", "B", "A", "B", "B", "A", "A", "B", "B", "A", "A")), row.names = c(NA, 
                                                                                                                                                                                                                                                   -20L), class = c("tbl_df", "tbl", "data.frame"))

推荐答案

这是我为您准备的:

# you had some values that were = 98 in usage and throwing everything off..
df <- df %>% dplyr::filter(usage < 50)

p <- 
ggplot(df, aes(allowed, usage)) +
  geom_boxplot(aes(group=group)) +
  geom_point() +
  geom_smooth(alpha=0, method='lm') +
  facet_wrap(~group, scales='free_x', strip.position = 'bottom') +
  theme_classic() +
  theme(
    axis.text.x = element_blank(),       # remove x axis text
    axis.ticks.x = element_blank(),      # remove tick marks on x axis
    axis.title.x = element_blank(),      # remove title for axis
    strip.background = element_blank(),  # no box on facet label
    strip.placement = 'outside',         # facet label is outside axis line
    strip.text = element_text(size=12),
    panel.spacing.x = unit(0, 'pt')      # remove space between facets
  )
p

一般的想法是考虑到您这里有2个x轴.要绘制点的主轴是 df $ allowed ,然后您要根据 df $ group 进行分组.我在这里想到的最简单的解决方案是,将 df $ group 的每个值都视为一个单独的方面,然后将其缝"起来.通过将小平面之间的间距设置为零,将小平面合并在一起.似乎运作良好.

The general idea is to consider that you kind of have 2 x axes here. The primary axis by which you want to plot your points is df$allowed, whereas you then want to group based on df$group. The easiest solution I can think of here was to treat each value of df$group as a separate facet, and then "stitch" the facets together by setting the space in-between them to zero. Seems to work well.

否则,这里唯一的注释是,框可能太靠近,以至于您不喜欢-区分一组要区别于另一组的要点.由于每个组都是一个构面,因此是一个完全独立的图,因此您可以挤压"图块.通过添加/扩展每个构面的主要x轴将这些框放在一起,如下所示:

The only comment here otherwise is that the boxes might be a bit too close together for your liking - making discrimination of the points of one group to be distinguished from another. Since each group is a facet, and therefore a completely separate plot, you can "squish" the boxes together by adding/expanding the primary x axis for each facet like so:

p + scale_x_continuous(expand=expansion(mult=c(0.8)))

注意:我必须删除一些使用中的超高值,以便能够正确地实际查看您的图.我想这是复制数据(像缺失值)的产物.

这篇关于在每个类别的x轴内覆盖geom_line-ggplot2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆