如何使用R在一个箱图中看到多个变量的异常值? [英] How can I see multiple variable's outlier in one boxplot using R?

查看:229
本文介绍了如何使用R在一个箱图中看到多个变量的异常值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手.我有一个问题.为了检查变量的离群值,我们通常使用:

I am a newbie to R. I have a question. For checking the outlier of a variable we generally use:

boxplot(train$rate)

假设速率是我的数据集的变量,train是我的数据集名称.但是,当我有多个变量(例如100或150个变量)时,一一检查变量的异常值将非常耗时.是否有任何函数可以将100个变量的离群值放在一个箱图中?

Suppose, the rate is the variable of my datasets and train is my data sets name. But when I have multiple variables like 100 or 150 variables, then it will be very time consuming to check one by one variable's outlier. Is there any function to bring the 100 variables' outlier in one boxplot?

如果是,那么哪个函数一次而不是一个一个地删除那些变量的异常值?请帮助解决此问题.

If yes, then which function is used to remove those variable's outlier at one time instead of one by one? Please help to solve this problem.

预先感谢

推荐答案

我同意Rui Barradas的观点,认为删除异常值是不明智的做法.只要该值有效,就应该将其保留在数据中,或者至少在有影响值和没有影响值的情况下运行两次单独的分析.您可以使用for循环将函数应用于数据集中的每个变量.

I agree with Rui Barradas that it is bad practice to remove outliers without further thought. As long as the value is valid you should keep it in your data or at least run two separate analyses with and without the influential value. You could use a for loop to apply a function to every variable in your dataset.

train2<-train # Copy old dataset
outvalue<-list() # Create two empty lists
outindex<-list()
for(i in 1:ncol(train2){ # For every column in your dataset
  outvalue[[i]]<-boxplot(train2[,i])$out # Plot and get the outlier value
  outindex[[i]]<-which(train2[,i] == outvalue[[i]]) # Get the outlier index
  train2[outindex[[i]],i] <- NA # Remove the outliers
}

这可以工作并绘制数据,但是速度很慢.如果您不想绘制数据,而只想查找异常值,则可以查看其他异常值函数,则extremevalues包提供了一种函数,该函数采用另一种方法来识别异常值,并且不需要绘制. 这使用了extremevalues包中的getOutliers函数

This works and plots the data, but it is quite slow. If you don't want to plot the data but just want the outliers you could look into other outlier functions, the extremevalues package has a function that takes a different approach to identifying outliers and doesn't require a plot. This uses the getOutliers function from the extremevalues package

outRight<-list()
outLeft<-outRight
for(i in 1:ncol(train2){
  outRight[[i]]<-getOutliers(train2[,i])$iRight
  outLeft[[i]]<-getOutliers(train2[,i])$iLeft
  train2[outRight[[i]],i] <- NA
  train2[outLeft[[i]],i] <- NA
}

这篇关于如何使用R在一个箱图中看到多个变量的异常值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆