在 R 中标记箱线图的异常值 [英] Labeling Outliers of Boxplots in R

查看:43
本文介绍了在 R 中标记箱线图的异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有创建箱线图的代码,在 R 中使用 ggplot,我想用年份和 Battle 标记我的异常值.

I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.

这是我创建箱线图的代码

Here is my code to create my boxplot

require(ggplot2)
ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome", 
y="Ratio of Portuguese to Dutch/British ships") + 
geom_boxplot(outlier.size=2,outlier.colour="green") + 
stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") + 
ggtitle("Portugese Sea Battles")

有人可以帮忙吗?我知道这是正确的,我只想标记异常值.

Can anyone help? I knew this is correct, I just want to label the outliers.

推荐答案

以下是使用 dplyr 和内置 mtcars 数据集的可重现解决方案.

The following is a reproducible solution that uses dplyr and the built-in mtcars dataset.

遍历代码:首先,创建一个函数,is_outlier,如果传递给它的值是异常值,它将返回一个布尔值TRUE/FALSE.然后我们执行分析/检查"并绘制数据——首先我们group_by我们的变量(在这个例子中是cyl,在你的例子中,这将是PortugesOutcome) 并且我们在对 mutate 的调用中添加一个变量 outlier(如果 drat 变量是一个异常值 [注意这对应到 RatioPort2Dutch in your example],我们将传递 drat 值,否则我们将返回 NA 以便不绘制该值).最后,我们绘制结果并通过 geom_text 和一个与我们的新变量相等的美学标签绘制文本值;此外,我们使用 hjust 偏移文本(将其向右滑动一点),以便我们可以看到离群点旁边而不是顶部的值.

Walking through the code: First, create a function, is_outlier that will return a boolean TRUE/FALSE if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by our variable (cyl in this example, in your example, this would be PortugesOutcome) and we add a variable outlier in the call to mutate (if the drat variable is an outlier [note this corresponds to RatioPort2Dutch in your example], we will pass the drat value, otherwise we will return NA so that value is not plotted). Finally, we plot the results and plot the text values via geom_text and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust so that we can see the values next to, rather than on top of, the outlier points.

library(dplyr)
library(ggplot2)

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

mtcars %>%
  group_by(cyl) %>%
  mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>%
  ggplot(., aes(x = factor(cyl), y = drat)) +
    geom_boxplot() +
    geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)

这篇关于在 R 中标记箱线图的异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆