通过R中的标志指示器删除行组 [英] Remove group of rows by flag indicator in R
问题描述
我有一个数据框,其中unique3列中有一组数字.
I have a dataframe where I have groups of numbers in the unique3 column.
structure(list(unique1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("11/1/2016", "11/10/2016", "11/11/2016",
"11/12/2016", "11/13/2016", "11/14/2016", "11/15/2016", "11/16/2016",
"11/17/2016", "11/18/2016", "11/19/2016", "11/2/2016", "11/20/2016",
"11/21/2016", "11/22/2016", "11/23/2016", "11/24/2016", "11/25/2016",
"11/26/2016", "11/27/2016", "11/28/2016", "11/3/2016", "11/4/2016",
"11/5/2016", "11/6/2016", "11/7/2016", "11/8/2016", "11/9/2016"
),
class = "factor"), unique2 = c(21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 31L, 41L), unique3 = c(100001L, 100001L, 100001L, 100001L,
100001L, 100001L, 100001L, 100001L, 100002L, 100003L),
flag = c(NA_integer_,1, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), value = c(1L,
6L, 18L, 19L, 22L, 29L, 30L, 32L, 1L, 1L)),
.Names = c("unique1","unique2", "unique3", "flag", "value"), row.names = c(NA, 10L), class = "data.frame")
unique1 unique2 unique3 flag value
1 11/1/2016 21 100001 NA 1
2 11/1/2016 21 100001 1 6
3 11/1/2016 21 100001 NA 18
4 11/1/2016 21 100001 NA 19
5 11/1/2016 21 100001 NA 22
6 11/1/2016 21 100001 NA 29
7 11/1/2016 21 100001 NA 30
8 11/1/2016 21 100001 NA 32
9 11/1/2016 31 100002 NA 1
10 11/1/2016 41 100003 NA 1
我基本上需要按唯一列3进行分组,如果100001的任何行中有1 in标志.他们将被删除.尽管100001可能不是唯一的,并且可能会针对不同的unique2值重复.
I basically need to group by unique column 3 where if any of the rows for 100001 had a 1 in flag. They would be removed. Although 100001 may not be unique and may repeat for a different value of unique2.
我要做的是像这样使唯一3的所有值都具有1的值
What I would do is make all the values for unique 3 to have a value of 1 like so
unique1 unique2 unique3 flag value
1 11/1/2016 21 100001 1 1
2 11/1/2016 21 100001 1 6
3 11/1/2016 21 100001 1 18
4 11/1/2016 21 100001 1 19
5 11/1/2016 21 100001 1 22
6 11/1/2016 21 100001 1 29
7 11/1/2016 21 100001 1 30
8 11/1/2016 21 100001 1 32
9 11/1/2016 31 100002 NA 1
10 11/1/2016 41 100003 NA 1
,然后按分组并过滤以具有:
and then group by and filter to have:
unique1 unique2 unique3 flag value
1 11/1/2016 21 100001 1 1
2 11/1/2016 21 100001 1 6
3 11/1/2016 21 100001 1 18
4 11/1/2016 21 100001 1 19
5 11/1/2016 21 100001 1 22
6 11/1/2016 21 100001 1 29
7 11/1/2016 21 100001 1 30
8 11/1/2016 21 100001 1 32
推荐答案
第一步(将标志均匀地应用于每个组):
For the first step (applying the flag uniformly to each group):
DF$flag <- ave(DF$flag, DF$unique3, FUN = function(x) max(c(0,x), na.rm=TRUE))
然后,您可以过滤几种不同的方式.一种选择是:
Then you can filter a few different ways. One option is:
subset(DF, flag == 1)
工作原理
ave(v,g1,g2,g3,FUN = f)
根据分组变量拆分向量 v
;对每个子向量应用一个函数;重新组合以返回与 v
具有相同类的向量.
ave(v, g1, g2, g3, FUN = f)
splits up vector v
based on grouping variables; applies a function to each subvector; recombines to return a vector with the same class as v
.
max(c(0,x),na.rm = TRUE)
删除NA值,添加0值,然后取最大值.如果 x
仅包含1和NA,则如果 x
包含任何1并返回0,则将返回1.
max(c(0,x), na.rm=TRUE)
removes the NA values, adds a 0 value and then takes the max. If x
only contains 1s and NAs, this will return a 1 if x
contains any 1 and otherwise returns 0.
一些带有软件包的替代方案
library(data.table)
DT = setDT(copy(DF))
DT[, flag := max(c(0,flag), na.rm=TRUE), by=unique3][ flag == 1 ]
# or...
library(dplyr)
DF2 = DF
(DF2 %<>%
group_by(unique3) %>%
mutate(flag = max(c(0,flag), na.rm=TRUE))
) %>% filter(flag == 1)
(我在这里仅创建DF2和DT对象,因此可以直接运行代码而不会在DF上进行编辑.)
(I'm only creating the DF2 and DT objects here so the code can be run directly without conflicting edits on DF.)
这篇关于通过R中的标志指示器删除行组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!