ggplot dotplot:geom_dotplot的正确用法是什么? [英] ggplot dotplot: What is the proper use of geom_dotplot?

查看:99
本文介绍了ggplot dotplot:geom_dotplot的正确用法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目的是复制此图 [ref] ggplot2(作者:Hadley Wickham).

这是基于geom_point和一些丑陋的数据准备(参见下面的代码)的我的努力:

如何使用geom_dotplot()做到这一点?

在尝试中,我遇到了几个问题:(1)将geom_dotplot生成的默认密度映射到一个计数;(2)切断轴;(3)没有意外的孔.我放弃了,改而入侵了geom_point().

我希望(并且仍然希望)它会像

一样简单

ggplot(data, aes(x,y)) + geom_dotplot(stat = "identity")

但没有.所以这是我尝试过的以及输出:

# Data
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))

# dotplot based on geom_dotplot
geom_dots <- function(x, count, round = 10, breaks = NULL, ...) {
    require(ggplot2)
    n = sum(count) # total number of dots to be drawn
    b = round*round(n/round) # prettify breaks
    x = rep(x, count) # make x coordinates for dots
    if (is.null(breaks))  breaks = seq(0, 1, b/4/n)
    ggplot(data.frame(x = x), aes(x = x)) +
        geom_dotplot(method = "histodot", ...) +
        scale_y_continuous(breaks = breaks, 
                        #limits = c(0, max(count)+1), # doesn't work
                        labels = breaks * n) 
} 

geom_dots(x = df$x, count = df$y) 

# dotplot based on geom_point
ggplot_dot <- function(x, count, ...) {
    require(ggplot2)
    message("The count variable must be an integer")
    count = as.integer(count) # make sure these are counts
    n = sum(count) # total number of dots to be drawn
    x = rep(x, count) # make x coordinates for dots
    count = count[count > 0]  # drop zero cases 
    y = integer(0)  # initialize y coordinates for dots
    for (i in seq_along(count)) 
        y <- c(y, 1:(count[i]))  # compute y coordinates
    ggplot(data.frame(x = x, y = y), aes(x = x, y = y)) +
        geom_point(...)  # draw one dot per positive count
}

ggplot_dot(x = df$x, count = df$y, 
    size = 11, shape = 21, fill = "orange", color = "black") + theme_gray(base_size = 18)
# ggsave("dotplot.png") 
ggsave("dotplot.png", width = 12, height = 5.9)

简短随机注释:使用geom_point()解决方案时,保存图形涉及正确调整尺寸以确保点相互接触(点大小和图形高度/宽度).使用geom_dotplot()解决方案,我对标签进行了四舍五入以使其更漂亮.不幸的是,我无法在大约100处切断轴:使用limits()coord_cartesian()会导致整个图的缩放,而不是削减.还要注意,使用geom_dotplot()时,我基于计数创建了一个数据向量,因为我无法直接使用count变量(我期望stat="identity"可以做到这一点,但我无法使其工作). /p>

解决方案

巧合的是,我也花了整整一天的时间与geom_dotplot()进行斗争,并尽力使它显示出来.我还没有找到使y轴显示实际数字的方法,但是我已经找到了一种截断y轴的方法.如您所述,coord_cartesian()limits不起作用,但coord_fixed()起作用,因为它强制执行x:y单位的比率:

 library(tidyverse)
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))
df <- tidyr::uncount(df, y) 

ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  coord_fixed(ratio = 15)
 

在这里使用15作为比率是可行的,因为x轴也使用相同的单位(即单个整数).如果x轴是百分比或对数美元或日期或其他参数,则必须修改比率,直到y轴被截断为止.


使用用于合并地块的方法进行编辑

正如我在下面的评论中提到的那样,使用拼凑而成的图形将coord_fixed()组合在一起并不能很好地工作.但是,如果您手动将组合图的高度(或宽度)设置为与coord_fixed() 中的比率相同的值,并确保每个图具有相同的x轴,则可以得到psuedo-多面情节

 # Make a subset of df
df2 <- df %>% slice(1:25)

plot1 <- ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  # Make xlim the same on both plots
  coord_fixed(ratio = 15, xlim = c(75, 110))

plot2 <- ggplot(df2, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  coord_fixed(ratio = 7, xlim = c(75, 110))

# Combine both plots in a single column, with each sized incorrectly
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1)
 

 # Combine both plots in a single column, with each sized appropriately
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1, heights = c(15, 7) / (15 + 7))
 

My purpose is to reproduce this figure [ref] with ggplot2 (author: Hadley Wickham).

Here is my effort based on geom_point and some ugly data preparation (see code further down):

How could I do that with geom_dotplot()?

In my attempts I have encountered several problems: (1) map the default density produced by geom_dotplot to a count, (2) cut off the axis, (3) not have unexpected holes. I gave up and hacked geom_point() instead.

I expected (and still hope) it would be as simple as

ggplot(data, aes(x,y)) + geom_dotplot(stat = "identity")

but no. So here's what I've tried and the output:

# Data
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))

# dotplot based on geom_dotplot
geom_dots <- function(x, count, round = 10, breaks = NULL, ...) {
    require(ggplot2)
    n = sum(count) # total number of dots to be drawn
    b = round*round(n/round) # prettify breaks
    x = rep(x, count) # make x coordinates for dots
    if (is.null(breaks))  breaks = seq(0, 1, b/4/n)
    ggplot(data.frame(x = x), aes(x = x)) +
        geom_dotplot(method = "histodot", ...) +
        scale_y_continuous(breaks = breaks, 
                        #limits = c(0, max(count)+1), # doesn't work
                        labels = breaks * n) 
} 

geom_dots(x = df$x, count = df$y) 

# dotplot based on geom_point
ggplot_dot <- function(x, count, ...) {
    require(ggplot2)
    message("The count variable must be an integer")
    count = as.integer(count) # make sure these are counts
    n = sum(count) # total number of dots to be drawn
    x = rep(x, count) # make x coordinates for dots
    count = count[count > 0]  # drop zero cases 
    y = integer(0)  # initialize y coordinates for dots
    for (i in seq_along(count)) 
        y <- c(y, 1:(count[i]))  # compute y coordinates
    ggplot(data.frame(x = x, y = y), aes(x = x, y = y)) +
        geom_point(...)  # draw one dot per positive count
}

ggplot_dot(x = df$x, count = df$y, 
    size = 11, shape = 21, fill = "orange", color = "black") + theme_gray(base_size = 18)
# ggsave("dotplot.png") 
ggsave("dotplot.png", width = 12, height = 5.9)

Brief random comment: With the geom_point() solution, saving the plot involves tweaking the sizes just right to ensure that the dots are in contact (both the dot size and the plot height/width). With the geom_dotplot() solution, I rounded the labels to make them prettier. Unfortunately I was not able to cut off the axis at about 100: using limits() or coord_cartesian() results in a rescaling of the entire plot and not a cut. Note also that to use geom_dotplot() I created a vector of data based on the count, as I was unable to use the count variable directly (I expected stat="identity" to do that, but I couldn't make it work).

解决方案

Coincidentally, I've also spent the past day fighting with geom_dotplot() and trying to make it show a count. I haven't figured out a way to make the y axis show actual numbers, but I have found a way to truncate the y axis. As you mentioned, coord_cartesian() and limits don't work, but coord_fixed() does, since it enforces a ratio of x:y units:

library(tidyverse)
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))
df <- tidyr::uncount(df, y) 

ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  coord_fixed(ratio = 15)

Using 15 as the ratio here works because the x-axis is also in the same units (i.e. single integers). If the x-axis is a percentage or log dollars or date or whatever, you have to tinker with the ratio until the y-axis is truncated enough.


Edited with method for combining plots

As I mentioned in a comment below, using patchwork to combine plots with coord_fixed() doesn't work well. However, if you manually set the heights (or widths) of the combined plots to the same values as the ratio in coord_fixed() and ensure that each plot has the same x axis, you can get psuedo-faceted plots

# Make a subset of df
df2 <- df %>% slice(1:25)

plot1 <- ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  # Make xlim the same on both plots
  coord_fixed(ratio = 15, xlim = c(75, 110))

plot2 <- ggplot(df2, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  coord_fixed(ratio = 7, xlim = c(75, 110))

# Combine both plots in a single column, with each sized incorrectly
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1)

# Combine both plots in a single column, with each sized appropriately
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1, heights = c(15, 7) / (15 + 7))

这篇关于ggplot dotplot:geom_dotplot的正确用法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆