ggplot2:将样本大小信息添加到x轴刻度标签 [英] ggplot2: Adding sample size information to x-axis tick labels

查看:326
本文介绍了ggplot2:将样本大小信息添加到x轴刻度标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与
有关, - 创建自定义几何来计算汇总统计信息并将它们显示在绘图区域外部
(注意:所有功能都已经过简化;没有错误检查正确的对象类型,NAs等)



在基数R中,创建一个函数可以很容易地生成带有分组变量每个级别下方指定的样本大小的带状图:您可以将样本大小使用 mtext()函数的信息:

  stripchart_w_n_ver1 < - 函数(data,x.var,y.var){
x< - factor(data [,x.var])
y< - data [,y.var]
#需要调用plot.default()而不是图,因为当x是因子时,
#plot()会产生箱形图。
plot.default(x,y,xaxt =n,xlab = x.var,ylab = y.var)
levels.x < - levels(x)
x。 (长度(x))
轴(1,at = x.ticks,labels = levels.x)
n< - sapply(split(y,x),length )
mtext(paste0(N =,n),side = 1,line = 2,at = x.ticks)
}

stripchart_w_n_ver1(mtcars,cyl ,mpg)

或者您可以将样本大小信息添加到x轴刻度标签使用 axis()函数:

  stripchart_w_n_ver2 < -  function data,x.var,y.var){
x< - factor(data [,x.var])
y< - data [,y.var]
#需要设置将mgp的第二个元素设置为1.5
#,以便为x轴刻度标签留出两行空间。
o.par< - par(mgp = c(3,1.5,0))
on.exit(par(o.par))
#需要调用plot.default( )而不是绘图,因为当x是一个因子时,
#plot()会产生箱形图。
plot.default(x,y,xaxt =n,xlab = x.var,ylab = y.var)
n < - sapply(split(y,x),length)
levels.x < - levels(x)
axis(1,at = 1:length(levels.x),labels = paste0(levels.x,\\\
N =,n))


stripchart_w_n_ver2(mtcars,cyl,mpg)

的示例



虽然在基本R中这是一项非常简单的任务,但它在ggplot2中非常复杂,因为它很难获得用于生成图的数据,并且有相当于 axis()的函数(例如, scale_x_discrete 等),没有等价于 mtext(),可以让您轻松地将文本放置在边距内的指定坐标处。

我尝试使用 stat_summary()函数中的内建函数来计算样本大小(即 fun.y =length),然后将该信息放在x轴刻度标签上,但据我所知,无法提取样本大小,然后以某种方式添加它们使用函数 scale_x_discrete()到x轴刻度标签,您必须告诉 stat_summary()什么几何你想要它使用。你可以设置 geom =text,但是你必须提供标签,关键是标签应该是样本大小的值,这就是 stat_summary()正在计算,但您无法获得(并且您还必须指定要放置文本的位置,而且很难找出放置它的位置,使其位于x轴刻度标签的正下方)。 小图Extending ggplot2(



我以为我通过简单地为 ggplot 创建一个包装函数解决了这个问题:

  ggstripchart<  -  function(data,x.name,y.name,
point.params = list(),
x.axis.params = list(labels = levels (x)),
y.axis.params = list(),...){
if(!is.factor(data [,x.name]))
data [ ,x.name]< - factor(data [,x.name])
x< - data [,x.name]
y< - data [,y.name]
params< - list(...)
point.params< - modifyList(params,point.params)
x.axis.params< - modifyList(params,x.axis.params)
y.axis.params< - modifyList(params,y.axis.params)

point< - do.call(geom_point,point.params)

stripchart.list< - list(
point,
theme(legend.position =none)


n< - sapply(split(y,x),length)
x.axis.params $ labels< - paste0(x.axis.params $ labels,\\\
N =,n )
x.axis< - do.call(scale_x_discrete,x.axis.params)
y.axis< - do.call(scale_y_continuous,y.axis.params)$ b $ stripchart.list< -c(stripchart.list,x.axis,y.axis)

ggplot(data = data,mapping = aes_string(x = x.name,y = y .name))+ stripchart.list
}


ggstripchart(mtcars,cyl,mpg)
pre>

的示例



但是,此函数不会与刻面正确工作。例如:

  ggstripchart(mtcars,cyl,mpg)+ facet_wrap(〜am)

显示每个方面合并的两个构面的样本大小。我将不得不构建包装函数的构面,这会破坏试图使用 ggplot 所提供的所有内容。



如果有人对这个问题有任何认识,我将不胜感激。非常感谢您的时间!

解决方案

我更新了


This question is related to Create custom geom to compute summary statistics and display them *outside* the plotting region (NOTE: All functions have been simplified; no error checks for correct objects types, NAs, etc.)

In base R, it is quite easy to create a function that produces a stripchart with the sample size indicated below each level of the grouping variable: you can add the sample size information using the mtext() function:

stripchart_w_n_ver1 <- function(data, x.var, y.var) {
    x <- factor(data[, x.var])
    y <- data[, y.var]
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor.
    plot.default(x, y, xaxt = "n",  xlab = x.var, ylab = y.var)
    levels.x <- levels(x)
    x.ticks <- 1:length(levels(x))
    axis(1, at = x.ticks, labels = levels.x)
    n <- sapply(split(y, x), length)
    mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks)
}

stripchart_w_n_ver1(mtcars, "cyl", "mpg")

or you can add the sample size information to the x-axis tick labels using the axis() function:

stripchart_w_n_ver2 <- function(data, x.var, y.var) {
    x <- factor(data[, x.var])
    y <- data[, y.var]
# Need to set the second element of mgp to 1.5 
# to allow room for two lines for the x-axis tick labels.
    o.par <- par(mgp = c(3, 1.5, 0))
    on.exit(par(o.par))
# Need to call plot.default() instead of plot because 
# plot() produces boxplots when x is a factor.
    plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
    n <- sapply(split(y, x), length)
    levels.x <- levels(x)
    axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n))
}

stripchart_w_n_ver2(mtcars, "cyl", "mpg")

While this is a very easy task in base R, it is maddingly complex in ggplot2 because it is very hard to get at the data being used to generate the plot, and while there are functions equivalent to axis() (e.g., scale_x_discrete, etc.) there is no equivalent to mtext() that lets you easily place text at specified coordinates within the margins.

I tried using the built in stat_summary() function to compute the sample sizes (i.e., fun.y = "length") and then place that information on the x-axis tick labels, but as far as I can tell, you can't extract the sample sizes and then somehow add them to the x-axis tick labels using the function scale_x_discrete(), you have to tell stat_summary() what geom you want it to use. You could set geom="text", but then you have to supply the labels, and the point is that the labels should be the values of the sample sizes, which is what stat_summary() is computing but which you can't get at (and you would also have to specify where you want the text to be placed, and again, it is difficult to figure out where to place it so that it lies directly underneath the x-axis tick labels).

The vignette "Extending ggplot2" (http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html) shows you how to create your own stat function that allows you to get directly at the data, but the problem is that you always have to define a geom to go with your stat function (i.e., ggplot thinks you want to plot this information within the plot, not in the margins); as far as I can tell, you can't take the information you compute in your custom stat function, not plot anything in the plot area, and instead pass the information to a scales function like scale_x_discrete(). Here was my try at doing it this way; the best I could do was place the sample size information at the minimum value of y for each group:

StatN <- ggproto("StatN", Stat,
    required_aes = c("x", "y"), 
    compute_group = function(data, scales) {
    y <- data$y
    y <- y[!is.na(y)]
    n <- length(y)
    data.frame(x = data$x[1], y = min(y), label = paste0("n=", n))
    }
)

stat_n <- function(mapping = NULL, data = NULL, geom = "text", 
    position = "identity", inherit.aes = TRUE, show.legend = NA, 
        na.rm = FALSE, ...) {
    ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom, 
        position = position, inherit.aes = inherit.aes, show.legend = show.legend, 
        params = list(na.rm = na.rm, ...))
}

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n()

I thought I had solved the problem by simply creating a wrapper function to ggplot:

ggstripchart <- function(data, x.name, y.name,  
    point.params = list(), 
    x.axis.params = list(labels = levels(x)), 
    y.axis.params = list(), ...) {
    if(!is.factor(data[, x.name]))
    data[, x.name] <- factor(data[, x.name])
    x <- data[, x.name]
    y <- data[, y.name]
    params <- list(...)
    point.params    <- modifyList(params, point.params)
    x.axis.params   <- modifyList(params, x.axis.params)
    y.axis.params   <- modifyList(params, y.axis.params)

    point <- do.call("geom_point", point.params)

    stripchart.list <- list(
        point, 
        theme(legend.position = "none")
    )

    n <- sapply(split(y, x), length)
    x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n)
    x.axis <- do.call("scale_x_discrete", x.axis.params)
    y.axis <- do.call("scale_y_continuous", y.axis.params)
    stripchart.list <- c(stripchart.list, x.axis, y.axis)           

    ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list
}


ggstripchart(mtcars, "cyl", "mpg")

However, this function does not work correctly with faceting. For example:

ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am)

shows the the sample sizes for both facets combined for each facet. I would have to build faceting into the wrapper function, which defeats the point of trying to use everything ggplot has to offer.

If anyone has any insights to this problem I would be grateful. Thanks so much for your time!

解决方案

I have updated the EnvStats package to include a stat called stat_n_text which will add the sample size (the number of unique y-values) below each unique x-value. See the help file for stat_n_text for more information and a list of examples. Below is a simple example:

library(ggplot2)
library(EnvStats)

p <- ggplot(mtcars, 
  aes(x = factor(cyl), y = mpg, color = factor(cyl))) + 
  theme(legend.position = "none")

p + geom_point() + 
  stat_n_text() + 
  labs(x = "Number of Cylinders", y = "Miles per Gallon")

这篇关于ggplot2:将样本大小信息添加到x轴刻度标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆