在 ggplot2 中,箱线图线的末端代表什么? [英] In ggplot2, what do the end of the boxplot lines represent?

查看:39
本文介绍了在 ggplot2 中,箱线图线的末端代表什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找不到关于箱线图线的端点代表什么的描述.

I can't find a description of what the end points of the lines of a boxplot represent.

例如,这里是线结束处上方和下方的点值.

For example, here are point values above and below where the lines end.

(我意识到盒子的顶部和底部是第 25 个和第 75 个百分位数,而中心线是第 50 个百分位数).我假设,由于线上方和下方的点不代表最大值/最小值.

(I realize that the top and bottom of the box are 25th and 75th percentile, and the centerline is the 50th). I assume, as there are points above and below the lines that they do not represent the max/min values.

推荐答案

点"在箱线图的末尾代表异常值.确定一个点是否为异常值有许多不同的规则,但 R 和 ggplot 使用的方法是1.5 规则".如果数据点是:

The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:

  • 低于第一季度 - 1.5*IQR
  • 大于 Q3 + 1.5*IQR

然后该点被归类为异常值".晶须定义为:

then that point is classed as an "outlier". The whiskers are defined as:

上须 = min(max(x), Q_3 + 1.5 * IQR)

upper whisker = min(max(x), Q_3 + 1.5 * IQR)

下须 = max(min(x), Q_1 – 1.5 * IQR)

lower whisker = max(min(x), Q_1 – 1.5 * IQR)

其中 IQR = Q_3 – Q_1,盒子长度.所以上须位于最大 x 值和 Q_3 + 1.5 IQR 的较小处,而较低的须位于最小 x 值和 Q_1 – 1.5 IQR 中的较大.

where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR, whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.

其他信息

  • 请参阅维基百科箱线图页面,了解替代异常值规则.
  • 实际上有多种计算分位数的方法.查看 `?quantile 以了解 九种 不同方法的描述.
  • See the wikipedia boxplot page for alternative outlier rules.
  • There are actually a variety of ways of calculating quantiles. Have a look at `?quantile for the description of the nine different methods.

示例

看下面的例子

> set.seed(1)
> x = rlnorm(20, 1/2)#skewed data
> par(mfrow=c(1,3))
> boxplot(x, range=1.7, main="range=1.7")
> boxplot(x, range=1.5, main="range=1.5")#default
> boxplot(x, range=0, main="range=0")#The same as range="Very big number"

这给出了以下情节:

当我们将范围从 1.7 减少到 1.5 时,我们会减少晶须的长度.但是,range=0 是一种特殊情况 - 它相当于range=infinity"

As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However, range=0 is a special case - it's equivalent to "range=infinity"

这篇关于在 ggplot2 中,箱线图线的末端代表什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆