在 ggplot2 中,箱线图线的末端代表什么? [英] In ggplot2, what do the end of the boxplot lines represent?
问题描述
我找不到关于箱线图线的端点代表什么的描述.
I can't find a description of what the end points of the lines of a boxplot represent.
例如,这里是线结束处上方和下方的点值.
For example, here are point values above and below where the lines end.
(我意识到盒子的顶部和底部是第 25 个和第 75 个百分位数,而中心线是第 50 个百分位数).我假设,由于线上方和下方的点不代表最大值/最小值.
(I realize that the top and bottom of the box are 25th and 75th percentile, and the centerline is the 50th). I assume, as there are points above and below the lines that they do not represent the max/min values.
推荐答案
点"在箱线图的末尾代表异常值.确定一个点是否为异常值有许多不同的规则,但 R 和 ggplot 使用的方法是1.5 规则".如果数据点是:
The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:
- 低于第一季度 - 1.5*IQR
- 大于 Q3 + 1.5*IQR
然后该点被归类为异常值".晶须定义为:
then that point is classed as an "outlier". The whiskers are defined as:
上须 = min(max(x), Q_3 + 1.5 * IQR)
upper whisker = min(max(x), Q_3 + 1.5 * IQR)
下须 = max(min(x), Q_1 – 1.5 * IQR)
lower whisker = max(min(x), Q_1 – 1.5 * IQR)
其中 IQR = Q_3 – Q_1,盒子长度.所以上须位于最大 x 值和 Q_3 + 1.5 IQR 的较小处,而较低的须位于最小 x 值和 Q_1 – 1.5 IQR 中的较大.
where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR, whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.
其他信息
- 请参阅维基百科箱线图页面,了解替代异常值规则.
- 实际上有多种计算分位数的方法.查看 `?quantile 以了解 九种 不同方法的描述.
- See the wikipedia boxplot page for alternative outlier rules.
- There are actually a variety of ways of calculating quantiles. Have a look at `?quantile for the description of the nine different methods.
示例
看下面的例子
> set.seed(1)
> x = rlnorm(20, 1/2)#skewed data
> par(mfrow=c(1,3))
> boxplot(x, range=1.7, main="range=1.7")
> boxplot(x, range=1.5, main="range=1.5")#default
> boxplot(x, range=0, main="range=0")#The same as range="Very big number"
这给出了以下情节:
当我们将范围从 1.7 减少到 1.5 时,我们会减少晶须的长度.但是,range=0
是一种特殊情况 - 它相当于range=infinity"
As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However, range=0
is a special case - it's equivalent to "range=infinity"
这篇关于在 ggplot2 中,箱线图线的末端代表什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!