ggplot2 boxplot中值不按预期绘图 [英] ggplot2 boxplot medians aren't plotting as expected

查看:397
本文介绍了ggplot2 boxplot中值不按预期绘图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有一个相当大的数据集(Dropbox:csv文件) ,我试图使用 geom_boxplot 进行绘图。以下产生看似合理的情节:

  require(reshape2)
require(ggplot2)
要求(比例)
要求(网格)
要求(gridExtra)

df < - read.csv(\\ Downloadss\\boxplot.csv ,na.strings =*)
df $ year < - 因子(df $ year,levels = c(2010,2011,2012,2013,2014),labels = c(2010,2011,2012) ,2013,2014))

d < - ggplot(data = df,aes(x = year,y = value))+
geom_boxplot(aes(fill = station))+





























$ / code>

但是,当你深入一点时,问题就会让我感到厌烦。

  df.m < - 聚合值(值〜年份)当我用它们的值标记boxplot中值时, + station,data = df,FUN = function(x)median(x))
d <-d + geom_text(data = df.m,aes(x = year,y = value,label = value))
d



geom_boxplot绘制的中位数根本不在中位数。标签绘制在正确的y轴值,但boxplots的中间铰链绝对不在中位数。我已经被这几天难住了。



这是什么原因?这种类型的显示如何产生正确的中值?如何调试或诊断这个阴谋?

解决方案

这个问题的解决方案是在 scale_y_continuous 的应用程序中。 ggplot2将按以下顺序执行操作:
$ b


  1. 比例变换

  2. 统计计算

  3. 坐标变换

在这种情况下,由于调用了比例变换,ggplot2排除了比例限制用于箱形铰链的统计计算。然而,由聚合函数计算并用于 geom_text 指令的中间值将使用整个数据集。这可能会导致不同的中位铰链和文本标签。



解决方案是省略 scale_y_continuous 指令,而是使用:

  d < -  ggplot(data = df,aes(x = year,y = value))+ 
$ ge
$($)$ $ $ $ $ $ $ $ $ $ $ $ $ $ 15))

这允许ggplot2使用整个数据集计算boxplot铰链统计数据,数字的大小。

So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:

require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)

df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))

d <- ggplot(data = df, aes(x = year, y = value)) +
    geom_boxplot(aes(fill = station)) + 
    facet_grid(station~.) +
    scale_y_continuous(limits = c(0, 15)) + 
    theme(legend.position = "none"))
d

However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.

df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value)) 
d

The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.

What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?

解决方案

The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:

  1. Scale Transformations
  2. Statistical Computations
  3. Coordinate Transformations

In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.

The solution is to omit the scale_y_continuous instruction and instead use:

d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) + 
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))

This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.

这篇关于ggplot2 boxplot中值不按预期绘图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆