在ggplot2中,boxplot行的结尾代表什么? [英] In ggplot2, what do the end of the boxplot lines represent?

查看:149
本文介绍了在ggplot2中,boxplot行的结尾代表什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法找到一个boxplot行代表的结束点的描述。

例如,这里是线条结束处的上下两个点值。

(我意识到箱子的顶部和底部是第25百分位和第75百分位,中线是第50位)。我假设,因为上面和下面有点,他们不代表最大值/最小值。点击查看大图

点在箱线图的末尾代表异常值。确定一个点是否是异常值有许多不同的规则,但R和ggplot使用的方法是1.5规则。如果数据点为:


  • 小于Q1 - 1.5 * IQR
  • 大于Q3 + 1.5 * IQR



那么该点就被归类为异常值。胡须被定义为:

上胡须= min(max(x),Q_3 + 1.5 * IQR)

<其中IQR = Q_3-Q_1,盒子长度。其中IQR = Q_3-Q_1。所以上胡须位于最大x值的和Q_3 + 1.5 IQR,
,而下胡须位于最小值的 x值和Q_1 - 1.5 IQR。



其他信息 $ b


  • 请参阅 wikipedia boxplot page for alternative outlier rules。

  • 有实际上是计算分位数的各种方法。看看用于描述 9 不同方法的`?quantile。


示例



考虑下面的例子

 > set.seed(1)
> x = rlnorm(20,1 / 2)#swwed data
> par(mfrow = c(1,3))
> boxplot(x,range = 1.7,main =range = 1.7)
> boxplot(x,range = 1.5,main =range = 1.5)#default
> boxplot(x,range = 0,main =range = 0)#与range =非常大的数字相同

这给出了以下图:



当我们将范围从1.7缩小到1.5时,我们减少了胡须的长度。然而, range = 0 是一个特殊情况 - 它相当于range = infinity


I can't find a description of what the end points of the lines of a boxplot represent.

For example, here are point values above and below where the lines end.

(I realize that the top and bottom of the box are 25th and 75th percentile, and the centerline is the 50th). I assume, as there are points above and below the lines that they do not represent the max/min values.

解决方案

The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:

  • less than Q1 - 1.5*IQR
  • greater than Q3 + 1.5*IQR

then that point is classed as an "outlier". The whiskers are defined as:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)

lower whisker = max(min(x), Q_1 – 1.5 * IQR)

where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR, whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.

Additional information

  • See the wikipedia boxplot page for alternative outlier rules.
  • There are actually a variety of ways of calculating quantiles. Have a look at `?quantile for the description of the nine different methods.

Example

Consider the following example

> set.seed(1)
> x = rlnorm(20, 1/2)#skewed data
> par(mfrow=c(1,3))
> boxplot(x, range=1.7, main="range=1.7")
> boxplot(x, range=1.5, main="range=1.5")#default
> boxplot(x, range=0, main="range=0")#The same as range="Very big number"

This gives the following plot:

As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However, range=0 is a special case - it's equivalent to "range=infinity"

这篇关于在ggplot2中,boxplot行的结尾代表什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆