标注R中盒形图的异常值 [英] Labeling Outliers of Boxplots in R
问题描述
我有创建boxplot的代码,使用g中的ggplot,我想用一年和Battle来标记我的异常值。
I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.
这里是我的代码来创建我的boxplot
Here is my code to create my boxplot
require(ggplot2)
ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome",
y="Ratio of Portuguese to Dutch/British ships") +
geom_boxplot(outlier.size=2,outlier.colour="green") +
stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") +
ggtitle("Portugese Sea Battles")
任何人都可以帮忙吗?我知道这是正确的,我只想标出异常值。
Can anyone help? I knew this is correct, I just want to label the outliers.
推荐答案
以下是使用 dplyr
的可重现解决方案, -in mtcars
数据集。
The following is a reproducible solution that uses dplyr
and the built-in mtcars
dataset.
遍历代码:首先,创建一个函数, is_outlier
,它将返回一个布尔值 TRUE / FALSE
如果传递给它的值是异常值。然后我们执行分析/检查并绘制数据 - 首先我们的 group_by
我们的变量( cyl
)例如,在你的例子中,这将是 PortugesOutcome
),并且在调用<$ c>时添加一个变量 outlier
$ c> mutate (如果 drat
变量是一个异常值[注意这对应于 RatioPort2Dutch
在你的例子中],我们将传递 drat
值,否则我们将返回 NA
,这样值不会被绘制)。最后,我们绘制结果并通过 geom_text
绘制文本值,并将审美标签等于我们的新变量;此外,我们使用 hjust
来抵消文本(将其向右滑动一点),以便我们可以看到离群点旁边的值,而非顶点。
Walking through the code: First, create a function, is_outlier
that will return a boolean TRUE/FALSE
if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by
our variable (cyl
in this example, in your example, this would be PortugesOutcome
) and we add a variable outlier
in the call to mutate
(if the drat
variable is an outlier [note this corresponds to RatioPort2Dutch
in your example], we will pass the drat
value, otherwise we will return NA
so that value is not plotted). Finally, we plot the results and plot the text values via geom_text
and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust
so that we can see the values next to, rather than on top of, the outlier points.
library(dplyr)
library(ggplot2)
is_outlier <- function(x) {
return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}
mtcars %>%
group_by(cyl) %>%
mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>%
ggplot(., aes(x = factor(cyl), y = drat)) +
geom_boxplot() +
geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)
这篇关于标注R中盒形图的异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!