如果第二行符合条件,则从每个组中删除第一行 [英] Remove the first row from each group if the second row meets a condition

查看:88
本文介绍了如果第二行符合条件,则从每个组中删除第一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的数据集的一个示例:

df=data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
  Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
    "6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"), 
  Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul"))

我需要计算已经完成的日期与数据集之间的差值:

| id |    Date    | Buyer | diff |
|----|:----------:|------:|------|
| 9  | 11/29/2018 |  John | NA   |
| 9  | 11/29/2018 |  John | 0    |
| 9  | 11/29/2018 |  John | 0    |
| 5  | 5/25/2018  | Maria | -188 |
| 5  | 2/13/2019  | Maria | 264  |
| 5  | 2/13/2019  | Maria | 0    |
| 4  | 6/7/2018   | Sandy | -251 |
| 4  | 6/15/2018  | Sandy | 8    |
| 4  | 6/20/2018  | Sandy | 5    |
| 4  | 8/17/2018  | Sandy | 58   |
| 4  | 8/20/2018  | Sandy | 3    |
| 20 | 12/25/2018 | Paul  | 127  |
| 20 | 12/25/2018 | Paul  | 0    |

现在,如果列"diff"的每组中第二行的值大于或等于5,那么我需要删除每组的第一行.例如,对于ID为"5"的买方"Maria",差异值264大于5,因此我要删除该组中的第一行,即ID为"5"的买方"Maria",日期为'5/25/2018',差异为'-188'

下面是我的代码示例:

df1=df %>% group_by(Buyer,id) %>%
  mutate(diff = c(NA, diff(Date))) %>%
  filter(!(diff >=5 & row_number() == 1))

问题在于上面的代码选择的是第一行而不是第二行,并且我不知道如何将差异值应大于或等于5的每个组指定为第二行. /p>

我的预期输出应如下:

| id |    Date    | Buyer | diff |
|----|:----------:|------:|------|
| 9  | 11/29/2018 |  John | NA   |
| 9  | 11/29/2018 |  John | 0    |
| 9  | 11/29/2018 |  John | 0    |
| 5  | 2/13/2019  | Maria | 264  |
| 5  | 2/13/2019  | Maria | 0    |
| 4  | 6/15/2018  | Sandy | 8    |
| 4  | 6/20/2018  | Sandy | 5    |
| 4  | 8/17/2018  | Sandy | 58   |
| 4  | 8/20/2018  | Sandy | 3    |
| 20 | 12/25/2018 | Paul  | 127  |
| 20 | 12/25/2018 | Paul  | 0    |

解决方案

我认为您忘记了在df中提供diff列.我创建了一个名为diffs的文件,以使其与功能diff()不冲突. -

library(dplyr)

df1 %>% 
  group_by(id) %>% 
  mutate(diffs = c(NA, diff(as.Date(Date, format = "%m/%d/%Y")))) %>% 
  filter(
    n() == 1 |         # always keep if only one row in group
    row_number() > 1 | # always keep all row_number() > 1
    diffs[2] < 5       # keep 1st row only if 2nd row diffs < 5
  ) %>% 
  ungroup()

# A tibble: 11 x 4
   id    Date       Buyer diffs
   <chr> <chr>      <chr> <dbl>
 1 9     11/29/2018 John     NA
 2 9     11/29/2018 John      0
 3 9     11/29/2018 John      0
 4 5     2/13/2019  Maria   264
 5 5     2/13/2019  Maria     0
 6 4     6/15/2018  Sandy     8
 7 4     6/20/2018  Sandy     5
 8 4     8/17/2018  Sandy    58
 9 4     8/20/2018  Sandy     3
10 20    12/25/2018 Paul     NA
11 20    12/25/2018 Paul      0

数据-

我添加了stringsAsFactors = FALSE

df1 <- data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
  Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
    "6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"), 
  Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul")
  , stringsAsFactors = F)

Here's a sample of my dataset:

df=data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
  Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
    "6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"), 
  Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul"))

I need to calculate the difference between dates which I have already done and the dataset then looks like:

| id |    Date    | Buyer | diff |
|----|:----------:|------:|------|
| 9  | 11/29/2018 |  John | NA   |
| 9  | 11/29/2018 |  John | 0    |
| 9  | 11/29/2018 |  John | 0    |
| 5  | 5/25/2018  | Maria | -188 |
| 5  | 2/13/2019  | Maria | 264  |
| 5  | 2/13/2019  | Maria | 0    |
| 4  | 6/7/2018   | Sandy | -251 |
| 4  | 6/15/2018  | Sandy | 8    |
| 4  | 6/20/2018  | Sandy | 5    |
| 4  | 8/17/2018  | Sandy | 58   |
| 4  | 8/20/2018  | Sandy | 3    |
| 20 | 12/25/2018 | Paul  | 127  |
| 20 | 12/25/2018 | Paul  | 0    |

Now, if the value of second row within each group of column 'diff' is greater than or equal to 5, then I need to delete the first row of each group. For example, the diff value 264 is greater than 5 for Buyer 'Maria' having id '5', so I would want to delete the first row within that group which would be the buyer 'Maria' having id '5', Date as '5/25/2018', and diff as '-188'

Below is a sample of my code:

df1=df %>% group_by(Buyer,id) %>%
  mutate(diff = c(NA, diff(Date))) %>%
  filter(!(diff >=5 & row_number() == 1))

The problem is that the above code selects the first row instead of the second row and I don't know how to specify the row to be 2nd for each group where the diff value should be greater than or equal to 5.

My expected output should look like:

| id |    Date    | Buyer | diff |
|----|:----------:|------:|------|
| 9  | 11/29/2018 |  John | NA   |
| 9  | 11/29/2018 |  John | 0    |
| 9  | 11/29/2018 |  John | 0    |
| 5  | 2/13/2019  | Maria | 264  |
| 5  | 2/13/2019  | Maria | 0    |
| 4  | 6/15/2018  | Sandy | 8    |
| 4  | 6/20/2018  | Sandy | 5    |
| 4  | 8/17/2018  | Sandy | 58   |
| 4  | 8/20/2018  | Sandy | 3    |
| 20 | 12/25/2018 | Paul  | 127  |
| 20 | 12/25/2018 | Paul  | 0    |

解决方案

I think you forgot to provide the diff column in df. I created one called diffs so that it doesn't conflict with the function diff(). -

library(dplyr)

df1 %>% 
  group_by(id) %>% 
  mutate(diffs = c(NA, diff(as.Date(Date, format = "%m/%d/%Y")))) %>% 
  filter(
    n() == 1 |         # always keep if only one row in group
    row_number() > 1 | # always keep all row_number() > 1
    diffs[2] < 5       # keep 1st row only if 2nd row diffs < 5
  ) %>% 
  ungroup()

# A tibble: 11 x 4
   id    Date       Buyer diffs
   <chr> <chr>      <chr> <dbl>
 1 9     11/29/2018 John     NA
 2 9     11/29/2018 John      0
 3 9     11/29/2018 John      0
 4 5     2/13/2019  Maria   264
 5 5     2/13/2019  Maria     0
 6 4     6/15/2018  Sandy     8
 7 4     6/20/2018  Sandy     5
 8 4     8/17/2018  Sandy    58
 9 4     8/20/2018  Sandy     3
10 20    12/25/2018 Paul     NA
11 20    12/25/2018 Paul      0

Data -

I added stringsAsFactors = FALSE

df1 <- data.frame(id=c("9","9","9","5","5","5","4","4","4","4","4","20","20"),
  Date=c("11/29/2018","11/29/2018","11/29/2018","5/25/2018","2/13/2019","2/13/2019","6/7/2018",
    "6/15/2018","6/20/2018","8/17/2018","8/20/2018","12/25/2018","12/25/2018"), 
  Buyer= c("John","John","John","Maria","Maria","Maria","Sandy","Sandy","Sandy","Sandy","Sandy","Paul","Paul")
  , stringsAsFactors = F)

这篇关于如果第二行符合条件,则从每个组中删除第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆