在R中按组删除异常值 [英] remove outliers by group in R

查看：524 发布时间：2020/10/17 2:09:33 r dataframe dplyr

本文介绍了在R中按组删除异常值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我的数据集中，我必须分别删除每个组的离群值。
这是我的数据集

In my dataset, i must delete outliers for each group separately. Here my dataset

vpg=structure(list(customer = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L), code = c(2L, 2L, 3L, 3L, 4L, 4L, 
5L, 5L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L), year = c(2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L), stuff = c(10L, 20L, 30L, 
40L, 50L, 60L, 70L, 80L, 10L, 20L, 30L, 40L, 50L, 60L, 70L, 80L
), action = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 
0L, 1L, 0L, 1L)), .Names = c("customer", "code", "year", "stuff", 
"action"), class = "data.frame", row.names = c(NA, -16L))

我必须从填充变量中删除异常值，但必须按组customer + code + year分别删除

I must delete outlier from stuff variable, but separately by group customer+code+year

我发现了这个漂亮的函数

i found this pretty function

remove_outliers <- function(x, na.rm = TRUE, ...) {
  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
  H <- 1.5 * IQR(x, na.rm = na.rm)
  y <- x
  y[x < (qnt[1] - H)] <- NA
  y[x > (qnt[2] + H)] <- NA
  y
}

new <- remove_outliers(vpg$stuff)
vpg=cbind(new,vpg)
View(vpg)

但是它适用于所有组。
如何使用此功能删除每个组的异常值并获取下一个工作的清晰数据集？
注意，在此数据集中，存在变量action（其值分别为0和1）。它不是组变量，但仅对于操作变量零（0）类别必须删除异常值。

But it works for all groups. How use this function to delete outlier for each group and get clear dataset for next working ? Note , in this dataset, there is variable action(it tales value 0 and 1). It is not group variable, but outliers must be delete only for ZERO(0) categories of action variable.

在R中按组删除异常值 [英] remove outliers by group in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中按组删除异常值 [英] remove outliers by group in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭