不能在带有异常值的boxplot晶须上放置计数标签 [英] Cannot place count label at boxplot whisker with outliers present

查看:245
本文介绍了不能在带有异常值的boxplot晶须上放置计数标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在boxplot晶须的末端放置观察计数的标签,但在出现异常值时似乎不起作用。



我尝试过将最大/最小值与我认为是计算的晶须长度[四分位数1(或四分位数3)+(或 - )1.5 *四分位间距]进行比较。但是标签不会放在最大/最小值或胡须末端。



使用 mtcars 和y的示例轴反转来演示:

  library(ggplot2,dplyr)

mtcars%>%
select(qsec,cyl,am)%>%

ggplot(aes(factor(cyl),qsec,fill = factor(am)))+
stat_boxplot(geom =errorbar)+ ##在胡须末端绘制水平线
geom_boxplot(outlier.shape = 1,outlier.size = 3,
position = position_dodge(width = 0.75))+
scale_y_reverse()+
geom_text(data = mtcars%>%
select(qsec,cyl,am)%>%
group_by(cyl,am)%>%
总结(min_qsec = min(qsec),Count = n(),med = median(qsec),
q1 =分位数(qsec,0.25),
q3 =分位数(qsec,0.75), iqr = IQR(qsec),
qsec =平均值(qsec),
lab_pos = max(min_qsec,q1-1.5 * iqr)),
aes(y = lab_pos,label = Count),position = position_dodge(width = 0.75))

其中产生:



am(1) at cyl(4) am(0) at cyl(8)没有对齐。



我对 lab_pos 的计算是否不正确,或者是否有更好的方法来在晶须末端定位标签,而不考虑异常值?如果可能,我希望使用 ggplot2 dplyr 来完成它


$ b

  label_data <  -  mtcars%>%
select(qsec,cyl,am)%>%
group_by(cyl,am)%>%
summary(min_qsec = min(qsec) ,
Count = n(),
med =中值(qsec),
q1 =分位数(qsec,0.25),
q3 =分位数(qsec,0.75),
iqr = IQR(qsec),
lab_pos = min(ifelse(qsec> q1-1.5 * iqr,qsec,NA),na.rm = TRUE),
qsec =平均值(qsec) )

mtcars%>%
select(qsec,cyl,am)%>%
ggplot(aes(factor(cyl),qsec,fill = factor ))+
stat_boxplot(geom =errorbar)+ ##在胡须末端绘制水平线
geom_boxplot(outlier.shape = 1,outlier.size = 3,
position = position_dodge(width = 0.75))+
scale_y_reverse()+
geom_text(data = label_data,aes(y = lab_pos,label = Count),
position = position_dodge(width = 0.75),vjust = 0,fontface =bold)



晶须延伸到栅栏内的最远点,而不是栅栏本身。


I am trying to place labels of observations counts at the ends of boxplot whiskers, but it doesn't seem to work when there are outliers.

I have attempted to compare the max/min values with what I believe is the calculated whisker length [quartile 1 (or quartile 3) + (or -) 1.5 * interquartile range]. But the labels get placed at neither the max/min value or the whisker end.

Example using mtcars and y-axis reversed to demonstrate:

library(ggplot2,dplyr)

  mtcars %>%
    select(qsec, cyl,am) %>%

    ggplot(aes(factor(cyl),qsec,fill=factor(am))) + 
    stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
    geom_boxplot(outlier.shape=1, outlier.size=3, 
                 position =  position_dodge(width = 0.75)) +
    scale_y_reverse() +
    geom_text(data = mtcars %>%
                select(qsec,cyl,am) %>%
                group_by(cyl, am) %>%
                summarize(min_qsec = min(qsec),Count = n(),med = median(qsec),
                          q1 = quantile(qsec,0.25), 
                          q3 = quantile(qsec,0.75), iqr = IQR(qsec),
                          qsec = mean(qsec),
                          lab_pos = max(min_qsec, q1-1.5*iqr)),
              aes(y=lab_pos,label = Count), position = position_dodge(width = 0.75))

Which produces:

The labels for am(1) at cyl(4) and am(0) at cyl(8) are misaligned.

Is my calculation for lab_pos incorrect or is there a better approach to position labels at the whisker ends, regardless of outliers? I would like to accomplish it using ggplot2 and dplyr, if possible

解决方案

If I understand correctly, this is what you want:

label_data <- mtcars %>%
  select(qsec, cyl, am) %>%
  group_by(cyl, am) %>%
  summarize(min_qsec = min(qsec),
            Count = n(),
            med = median(qsec),
            q1 = quantile(qsec, 0.25), 
            q3 = quantile(qsec, 0.75),
            iqr = IQR(qsec),
            lab_pos = min(ifelse(qsec > q1-1.5*iqr, qsec, NA), na.rm = TRUE),
            qsec = mean(qsec))

mtcars %>%
  select(qsec, cyl,am) %>%
  ggplot(aes(factor(cyl),qsec,fill=factor(am))) + 
  stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
  geom_boxplot(outlier.shape=1, outlier.size=3, 
               position =  position_dodge(width = 0.75)) +
  scale_y_reverse() +
  geom_text(data = label_data, aes(y = lab_pos,label = Count),
            position = position_dodge(width = 0.75), vjust = 0, fontface = "bold")

The whiskers extend to the furthest point within the fence, not the fence itself.

这篇关于不能在带有异常值的boxplot晶须上放置计数标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆