通过R中的标志指示器删除行组 [英] Remove group of rows by flag indicator in R

查看:42
本文介绍了通过R中的标志指示器删除行组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中unique3列中有一组数字.

I have a dataframe where I have groups of numbers in the unique3 column.

structure(list(unique1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                    1L, 1L, 1L), .Label = c("11/1/2016", "11/10/2016", "11/11/2016", 
                                                            "11/12/2016", "11/13/2016", "11/14/2016", "11/15/2016", "11/16/2016", 
                                                            "11/17/2016", "11/18/2016", "11/19/2016", "11/2/2016", "11/20/2016", 
                                                            "11/21/2016", "11/22/2016", "11/23/2016", "11/24/2016", "11/25/2016", 
                                                            "11/26/2016", "11/27/2016", "11/28/2016", "11/3/2016", "11/4/2016", 
                                                            "11/5/2016", "11/6/2016", "11/7/2016", "11/8/2016", "11/9/2016"
                                    ), 
                        class = "factor"), unique2 = c(21L, 21L, 21L, 21L, 21L, 21L, 
                          21L, 21L, 31L, 41L), unique3 = c(100001L, 100001L, 100001L, 100001L, 
                                                       100001L, 100001L, 100001L, 100001L, 100002L, 100003L), 
               flag = c(NA_integer_,1, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
                           NA_integer_, NA_integer_, NA_integer_, NA_integer_), value = c(1L, 
                                                                                      6L, 18L, 19L, 22L, 29L, 30L, 32L, 1L, 1L)), 
          .Names = c("unique1","unique2", "unique3", "flag", "value"), row.names = c(NA, 10L), class = "data.frame")

     unique1 unique2 unique3 flag value
1  11/1/2016      21  100001   NA     1
2  11/1/2016      21  100001    1     6
3  11/1/2016      21  100001   NA    18
4  11/1/2016      21  100001   NA    19
5  11/1/2016      21  100001   NA    22
6  11/1/2016      21  100001   NA    29
7  11/1/2016      21  100001   NA    30
8  11/1/2016      21  100001   NA    32
9  11/1/2016      31  100002   NA     1
10 11/1/2016      41  100003   NA     1

我基本上需要按唯一列3进行分组,如果100001的任何行中有1 in标志.他们将被删除.尽管100001可能不是唯一的,并且可能会针对不同的unique2值重复.

I basically need to group by unique column 3 where if any of the rows for 100001 had a 1 in flag. They would be removed. Although 100001 may not be unique and may repeat for a different value of unique2.

我要做的是像这样使唯一3的所有值都具有1的值

What I would do is make all the values for unique 3 to have a value of 1 like so

     unique1 unique2 unique3 flag value
1  11/1/2016      21  100001   1     1
2  11/1/2016      21  100001   1     6
3  11/1/2016      21  100001   1    18
4  11/1/2016      21  100001   1    19
5  11/1/2016      21  100001   1    22
6  11/1/2016      21  100001   1    29
7  11/1/2016      21  100001   1    30
8  11/1/2016      21  100001   1    32
9  11/1/2016      31  100002   NA     1
10 11/1/2016      41  100003   NA     1

,然后按分组并过滤以具有:

and then group by and filter to have:

 unique1 unique2 unique3 flag value
1  11/1/2016      21  100001   1     1
2  11/1/2016      21  100001   1     6
3  11/1/2016      21  100001   1    18
4  11/1/2016      21  100001   1    19
5  11/1/2016      21  100001   1    22
6  11/1/2016      21  100001   1    29
7  11/1/2016      21  100001   1    30
8  11/1/2016      21  100001   1    32

推荐答案

第一步(将标志均匀地应用于每个组):

For the first step (applying the flag uniformly to each group):

DF$flag <- ave(DF$flag, DF$unique3, FUN = function(x) max(c(0,x), na.rm=TRUE))

然后,您可以过滤几种不同的方式.一种选择是:

Then you can filter a few different ways. One option is:

subset(DF, flag == 1)


工作原理

ave(v,g1,g2,g3,FUN = f)根据分组变量拆分向量 v ;对每个子向量应用一个函数;重新组合以返回与 v 具有相同类的向量.

ave(v, g1, g2, g3, FUN = f) splits up vector v based on grouping variables; applies a function to each subvector; recombines to return a vector with the same class as v.

max(c(0,x),na.rm = TRUE)删除NA值,添加0值,然后取最大值.如果 x 仅包含1和NA,则如果 x 包含任何1并返回0,则将返回1.

max(c(0,x), na.rm=TRUE) removes the NA values, adds a 0 value and then takes the max. If x only contains 1s and NAs, this will return a 1 if x contains any 1 and otherwise returns 0.

一些带有软件包的替代方案

library(data.table)
DT = setDT(copy(DF))

DT[, flag := max(c(0,flag), na.rm=TRUE), by=unique3][ flag == 1 ] 

# or...
library(dplyr)
DF2 = DF

(DF2 %<>% 
  group_by(unique3) %>% 
  mutate(flag = max(c(0,flag), na.rm=TRUE))
) %>% filter(flag == 1)

(我在这里仅创建DF2和DT对象,因此可以直接运行代码而不会在DF上进行编辑.)

(I'm only creating the DF2 and DT objects here so the code can be run directly without conflicting edits on DF.)

这篇关于通过R中的标志指示器删除行组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆