Plotly:在箱线图中用样本名称注释异常值 [英] Plotly: Annotate outliers with sample names in boxplot

查看:21
本文介绍了Plotly:在箱线图中用样本名称注释异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ggplot 和数据集 airquality 创建一个箱线图,其中 Month 在 x 轴上,Ozone 值位于 y 轴上.我的目标是对绘图进行注释,以便当我将鼠标悬停在异常点上时,除了臭氧值之外,它还应该显示 Sample 名称:

库(tidyverse)图书馆(情节)库(数据集)数据(空气质量)# 添加月份空气质量$Month <- 因子(airquality$Month,标签 = c(五月"、六月"、七月"、八月"、九月"))# 添加样本名称空气质量$Sample <- paste0('Sample_',seq(1:nrow(airquality)))# 箱形图p <- ggplot(airquality, aes(x = Month, y = Ozone)) +geom_boxplot()p <- plotly_build(p)p

这是创建的情节:

默认情况下,当我将鼠标悬停在每个框上时,它会显示 x 轴变量的基本汇总统计信息.但是,我还想看看异常样本是什么.例如当悬停在 May 上时,它会显示异常值 115,但并未显示它实际上是 Sample_30.

如何将 Sample 变量添加到离群点,使其同时显示离群值和样本名称?

解决方案

此方法将达到相同的结果,但不显示箱线图摘要统计悬停.删除离群值并悬停在箱线图图层上,并覆盖一个只有离群值的 geom_point 层和悬停信息.plotly 异常值的定义在

I am trying to create a boxplot with ggplot and plotly with the dataset airquality where Month is on the x-axis and Ozone values are on y-axis. My aim is to annotate the plot so that when I hover over the outlier points it should show the Sample name in addition to the Ozone value:

library(tidyverse)
library(plotly)
library(datasets)
data(airquality)

# add months
airquality$Month <- factor(airquality$Month,
                           labels = c("May", "Jun", "Jul", "Aug", "Sep"))

# add sample names
airquality$Sample <- paste0('Sample_',seq(1:nrow(airquality)))

# boxplot
p <- ggplot(airquality, aes(x = Month, y = Ozone)) +
  geom_boxplot()
p <- plotly_build(p)
p

Here is the plot that's created:

By default, when I hover over each of the boxes, it shows the basic summary stats of the x-axis variable. However, what I would also like to see is what the outlier samples are. For e.g. when hovering over May, it shows the outlier value 115 but it does not show that it is actually Sample_30.

How can I add the Sample variable to the outlier points so it shows both the outlier value as well as the sample name?

解决方案

This method will achieve the same result but does not show the boxplot summary statistics hover. Removes outlier and hover on boxplot layer and overlays a geom_point layer of only outliers with hover info. The definition of outliers for plotly are stated here. This method would work better than other solutions when dealing with more complex graphs (e.g. grouped side by side boxplots). Interestingly, the ggplotly boxplot graph for this data is not the same as the ggplot graph. The upper fence whisker for Aug in ggplotly extends much further than the ggplot upper fence whisker for Aug.

library(dplyr)
library(plotly)
library(datasets)
library(ggplot2)
data(airquality)

# manipulate data
mydata = airquality %>% 
    # add months
    mutate(Month = factor(airquality$Month,labels = c("May", "Jun", "Jul", "Aug", "Sep")),
    # add sample names
           Sample = paste0('Sample_',seq(1:n())))%>%
    # label if outlier sample by Month
    group_by(Month) %>% 
    mutate(OutlierFlag = ifelse((Ozone<quantile(Ozone,1/3,na.rm=T)-1.5*IQR(Ozone,na.rm=T)) | (Ozone>quantile(Ozone,2/3,na.rm=T)+1.5*IQR(Ozone,na.rm=T)),'Outlier','NotOutlier'))%>%
    group_by()


# boxplot
p <- ggplot(mydata, aes(x = Month, y = Ozone)) +
    geom_boxplot()+
    geom_point(data=mydata %>% filter(OutlierFlag=="Outlier"),aes(group=Month,label1=Sample,label2=Ozone),size=2)

output = ggplotly(p, tooltip=c("label1","label2"))

# makes boxplot outliers invisible and hover info off
for (i in 1:length(output$x$data)){
    if (output$x$data[[i]]$type=="box"){
        output$x$data[[i]]$marker$opacity = 0  
        output$x$data[[i]]$hoverinfo = "none"
    }
}

# print end result of plotly graph
output

这篇关于Plotly:在箱线图中用样本名称注释异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆