不能在带有异常值的boxplot晶须上放置计数标签 [英] Cannot place count label at boxplot whisker with outliers present
问题描述
我试图在boxplot晶须的末端放置观察计数的标签,但在出现异常值时似乎不起作用。
我尝试过将最大/最小值与我认为是计算的晶须长度[四分位数1(或四分位数3)+(或 - )1.5 *四分位间距]进行比较。但是标签不会放在最大/最小值或胡须末端。
使用 mtcars
和y的示例轴反转来演示:
library(ggplot2,dplyr)
mtcars%>%
select(qsec,cyl,am)%>%
ggplot(aes(factor(cyl),qsec,fill = factor(am)))+
stat_boxplot(geom =errorbar)+ ##在胡须末端绘制水平线
geom_boxplot(outlier.shape = 1,outlier.size = 3,
position = position_dodge(width = 0.75))+
scale_y_reverse()+
geom_text(data = mtcars%>%
select(qsec,cyl,am)%>%
group_by(cyl,am)%>%
总结(min_qsec = min(qsec),Count = n(),med = median(qsec),
q1 =分位数(qsec,0.25),
q3 =分位数(qsec,0.75), iqr = IQR(qsec),
qsec =平均值(qsec),
lab_pos = max(min_qsec,q1-1.5 * iqr)),
aes(y = lab_pos,label = Count),position = position_dodge(width = 0.75))
其中产生:
am(1) at cyl(4)
和 am(0)
at cyl(8)
没有对齐。
我对 lab_pos
的计算是否不正确,或者是否有更好的方法来在晶须末端定位标签,而不考虑异常值?如果可能,我希望使用 ggplot2
和 dplyr
来完成它
$ b
label_data < - mtcars%>%
select(qsec,cyl,am)%>%
group_by(cyl,am)%>%
summary(min_qsec = min(qsec) ,
Count = n(),
med =中值(qsec),
q1 =分位数(qsec,0.25),
q3 =分位数(qsec,0.75),
iqr = IQR(qsec),
lab_pos = min(ifelse(qsec> q1-1.5 * iqr,qsec,NA),na.rm = TRUE),
qsec =平均值(qsec) )
mtcars%>%
select(qsec,cyl,am)%>%
ggplot(aes(factor(cyl),qsec,fill = factor ))+
stat_boxplot(geom =errorbar)+ ##在胡须末端绘制水平线
geom_boxplot(outlier.shape = 1,outlier.size = 3,
position = position_dodge(width = 0.75))+
scale_y_reverse()+
geom_text(data = label_data,aes(y = lab_pos,label = Count),
position = position_dodge(width = 0.75),vjust = 0,fontface =bold)
晶须延伸到栅栏内的最远点,而不是栅栏本身。
I am trying to place labels of observations counts at the ends of boxplot whiskers, but it doesn't seem to work when there are outliers.
I have attempted to compare the max/min values with what I believe is the calculated whisker length [quartile 1 (or quartile 3) + (or -) 1.5 * interquartile range]. But the labels get placed at neither the max/min value or the whisker end.
Example using mtcars
and y-axis reversed to demonstrate:
library(ggplot2,dplyr)
mtcars %>%
select(qsec, cyl,am) %>%
ggplot(aes(factor(cyl),qsec,fill=factor(am))) +
stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
geom_boxplot(outlier.shape=1, outlier.size=3,
position = position_dodge(width = 0.75)) +
scale_y_reverse() +
geom_text(data = mtcars %>%
select(qsec,cyl,am) %>%
group_by(cyl, am) %>%
summarize(min_qsec = min(qsec),Count = n(),med = median(qsec),
q1 = quantile(qsec,0.25),
q3 = quantile(qsec,0.75), iqr = IQR(qsec),
qsec = mean(qsec),
lab_pos = max(min_qsec, q1-1.5*iqr)),
aes(y=lab_pos,label = Count), position = position_dodge(width = 0.75))
Which produces:
The labels for am(1)
at cyl(4)
and am(0)
at cyl(8)
are misaligned.
Is my calculation for lab_pos
incorrect or is there a better approach to position labels at the whisker ends, regardless of outliers? I would like to accomplish it using ggplot2
and dplyr
, if possible
If I understand correctly, this is what you want:
label_data <- mtcars %>%
select(qsec, cyl, am) %>%
group_by(cyl, am) %>%
summarize(min_qsec = min(qsec),
Count = n(),
med = median(qsec),
q1 = quantile(qsec, 0.25),
q3 = quantile(qsec, 0.75),
iqr = IQR(qsec),
lab_pos = min(ifelse(qsec > q1-1.5*iqr, qsec, NA), na.rm = TRUE),
qsec = mean(qsec))
mtcars %>%
select(qsec, cyl,am) %>%
ggplot(aes(factor(cyl),qsec,fill=factor(am))) +
stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
geom_boxplot(outlier.shape=1, outlier.size=3,
position = position_dodge(width = 0.75)) +
scale_y_reverse() +
geom_text(data = label_data, aes(y = lab_pos,label = Count),
position = position_dodge(width = 0.75), vjust = 0, fontface = "bold")
The whiskers extend to the furthest point within the fence, not the fence itself.
这篇关于不能在带有异常值的boxplot晶须上放置计数标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!