如何识别 R boxplot 中异常值的标签? [英] How can I identify the labels of outliers in a R boxplot?

查看:86
本文介绍了如何识别 R boxplot 中异常值的标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R boxplot 函数是一种非常有用的数据查看方式:它可以快速为您提供数据的大致位置和方差以及异常值数量的直观摘要.另外,我想找出异常值,以便快速发现数据集中的问题.

The R boxplot function is a very useful way to look at data: it quickly provides you with a visual summary of the approximate location and variance of your data, and the number of outliers. In addition, I'd like to identify the outliers, in order to quickly find problems in the dataset.

可以使用 myplot$out 访问这些异常值的值.不幸的是,这些异常值的标签似乎不可用.有一些软件包旨在在图本身上显示标签:http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/,但是它们效果不佳,我只想列出这些异常值,我不需要它们出现在情节本身中.

The values of these outliers can be accessed using myplot$out. Unfortunately, the labels of these outliers seem to be unavailable. There are some packages aimed at displaying the labels on the plot itself: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/, but they don't work well and I just want to list these outliers, I don't need them to be on the plot itself.

有什么想法吗?

推荐答案

大部分艰苦的工作都是您自己完成的.剩下的就是比较:

You've done most of the hard work yourself. All that is remaining is a comparison:

##First create some data 
##You should include this in your question)
set.seed(2)
dd = data.frame(x = rlnorm(26), y=LETTERS)

抓取异常值

outliers = boxplot(dd$x, plot=FALSE)$out

从原始数据框中提取异常值

Extract the outliers from the original data frame

dd[dd$x %in% outliers,]

<小时>

进一步说明:

变量 dd$x 是 26 个数字的向量.变量 outliers 包含异常值的值(只需在 R 控制台中键入 dd$xoutliers).命令

The variable dd$x is the vector of 26 numbers. The variable outliers contains the values of the outliers (just type dd$x and outliers in your R console). The command

dd$x %in% outliers

匹配 dd$x 和异常值的值,即:

matches the values of dd$x and outliers, viz:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE <snip>

方括号表示法,dd[dd$x %in% outliers,] 返回数据框dd的行,其中dd$x %in% 异常值 返回 TRUE.

The square bracket notation, dd[dd$x %in% outliers,] returns the rows of the data frame dd, where dd$x %in% outliers return TRUE.

这篇关于如何识别 R boxplot 中异常值的标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆