如果第二行符合条件,则从每个组中删除第一行 [英] Remove the first row from each group if the second row meets a condition
问题描述
这是我的数据集的一个示例:
df=data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
"6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"),
Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul"))
我需要计算已经完成的日期与数据集之间的差值:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 5/25/2018 | Maria | -188 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/7/2018 | Sandy | -251 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |
现在,如果列"diff"的每组中第二行的值大于或等于5,那么我需要删除每组的第一行.例如,对于ID为"5"的买方"Maria",差异值264大于5,因此我要删除该组中的第一行,即ID为"5"的买方"Maria",日期为'5/25/2018',差异为'-188'
下面是我的代码示例:
df1=df %>% group_by(Buyer,id) %>%
mutate(diff = c(NA, diff(Date))) %>%
filter(!(diff >=5 & row_number() == 1))
问题在于上面的代码选择的是第一行而不是第二行,并且我不知道如何将差异值应大于或等于5的每个组指定为第二行. /p>
我的预期输出应如下:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |
我认为您忘记了在df
中提供diff
列.我创建了一个名为diffs
的文件,以使其与功能diff()
不冲突. -
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(diffs = c(NA, diff(as.Date(Date, format = "%m/%d/%Y")))) %>%
filter(
n() == 1 | # always keep if only one row in group
row_number() > 1 | # always keep all row_number() > 1
diffs[2] < 5 # keep 1st row only if 2nd row diffs < 5
) %>%
ungroup()
# A tibble: 11 x 4
id Date Buyer diffs
<chr> <chr> <chr> <dbl>
1 9 11/29/2018 John NA
2 9 11/29/2018 John 0
3 9 11/29/2018 John 0
4 5 2/13/2019 Maria 264
5 5 2/13/2019 Maria 0
6 4 6/15/2018 Sandy 8
7 4 6/20/2018 Sandy 5
8 4 8/17/2018 Sandy 58
9 4 8/20/2018 Sandy 3
10 20 12/25/2018 Paul NA
11 20 12/25/2018 Paul 0
数据-
我添加了stringsAsFactors = FALSE
df1 <- data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
"6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"),
Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul")
, stringsAsFactors = F)
Here's a sample of my dataset:
df=data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
"6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"),
Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul"))
I need to calculate the difference between dates which I have already done and the dataset then looks like:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 5/25/2018 | Maria | -188 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/7/2018 | Sandy | -251 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |
Now, if the value of second row within each group of column 'diff' is greater than or equal to 5, then I need to delete the first row of each group. For example, the diff value 264 is greater than 5 for Buyer 'Maria' having id '5', so I would want to delete the first row within that group which would be the buyer 'Maria' having id '5', Date as '5/25/2018', and diff as '-188'
Below is a sample of my code:
df1=df %>% group_by(Buyer,id) %>%
mutate(diff = c(NA, diff(Date))) %>%
filter(!(diff >=5 & row_number() == 1))
The problem is that the above code selects the first row instead of the second row and I don't know how to specify the row to be 2nd for each group where the diff value should be greater than or equal to 5.
My expected output should look like:
| id | Date | Buyer | diff |
|----|:----------:|------:|------|
| 9 | 11/29/2018 | John | NA |
| 9 | 11/29/2018 | John | 0 |
| 9 | 11/29/2018 | John | 0 |
| 5 | 2/13/2019 | Maria | 264 |
| 5 | 2/13/2019 | Maria | 0 |
| 4 | 6/15/2018 | Sandy | 8 |
| 4 | 6/20/2018 | Sandy | 5 |
| 4 | 8/17/2018 | Sandy | 58 |
| 4 | 8/20/2018 | Sandy | 3 |
| 20 | 12/25/2018 | Paul | 127 |
| 20 | 12/25/2018 | Paul | 0 |
I think you forgot to provide the diff
column in df
. I created one called diffs
so that it doesn't conflict with the function diff()
. -
library(dplyr)
df1 %>%
group_by(id) %>%
mutate(diffs = c(NA, diff(as.Date(Date, format = "%m/%d/%Y")))) %>%
filter(
n() == 1 | # always keep if only one row in group
row_number() > 1 | # always keep all row_number() > 1
diffs[2] < 5 # keep 1st row only if 2nd row diffs < 5
) %>%
ungroup()
# A tibble: 11 x 4
id Date Buyer diffs
<chr> <chr> <chr> <dbl>
1 9 11/29/2018 John NA
2 9 11/29/2018 John 0
3 9 11/29/2018 John 0
4 5 2/13/2019 Maria 264
5 5 2/13/2019 Maria 0
6 4 6/15/2018 Sandy 8
7 4 6/20/2018 Sandy 5
8 4 8/17/2018 Sandy 58
9 4 8/20/2018 Sandy 3
10 20 12/25/2018 Paul NA
11 20 12/25/2018 Paul 0
Data -
I added stringsAsFactors = FALSE
df1 <- data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
"6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"),
Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul")
, stringsAsFactors = F)
这篇关于如果第二行符合条件,则从每个组中删除第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!