按组选择每个连续运行中的第一行 [英] Select first row in each contiguous run by group

查看:45
本文介绍了按组选择每个连续运行中的第一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有按ID"分组的数据.每个ID"在不同的日期都有不同的药物.在每次连续运行的药物"中,我只想保留第一行.这应该按组完成,即在每个ID"内.数据中有两个例子:

I have data which is grouped by 'ID'. Each 'ID' has different drugs at different dates. Within each consecutive run of 'drug', I would like to keep only the first row. This should be done by group, i.e. within each 'ID'. Two examples are shown in the data:

ID        date    drug  
 1  01/01/2020       A # first row in run 1 of 'A' for ID 1: keep 
 1  07/01/2020       A # 2nd row in run 1 of 'A' for ID 1: drop
 1  09/01/2020       B
 1  15/01/2020       A
 2  01/02/2020       C 
 2  13/02/2020       D
 2  17/02/2020       C # first row in run 2 of 'C' of ID 2: keep 
 2  18/03/2020       C # 2nd row in run 2 of 'C' of ID 2: drop 
 2  19/03/2020       E

所需的输出:

ID     date             drug  
1      01/01/2020        A
1      09/01/2020        B
1      15/01/2020        A
2      01/02/2020        C
2      13/02/2020        D
2      17/02/2020        C
2      19/03/2020        E

我已经尝试了以下方法,但我无法使它起作用,因为它会移除那些来自同一组但稍后出现的药物,例如它会在 15/01/2020、17/02/2020 和 18/03/2020 下降,因为它只按组进行第一次观察.

I have tried the following but I cannot make it work since it'll remove those drugs that are from the same group but appear later on e.g. it'd drop 15/01/2020, 17/02/2020 and 18/03/2020 since it only takes the first observation by group.

df_selection <- df %>%   
  group_by(ID) %>% 
  arrange(ID,date) %>% 
  group_by(ID, drug) %>% 
  slice(1L) %>% 
  arrange(ID,date)

我尝试了很多组合,但我无法让它发挥作用.我真的很感激一些帮助!

I have tried many combinations but I cannot make it work. I'd really appreciate some help!

另外一个例子来证明一个ID"中的最后一个药物"与下一个ID"中的第一个相同,这里是药物B":

An additional example to demonstrate a case where the last 'drug' in one 'ID' is the same as the first in the next 'ID', here drug 'B':

ID       date drug
 1 01/01/2020    A
 1 07/01/2020    A
 1 09/01/2020    B # first row in a run of 'B' for ID 1: keep 
 1 15/01/2020    B # 2nd row in a run of 'B' for ID 1: drop 
 2 01/02/2020    B # first row in a run of 'B' for ID 2: keep 
 2 13/02/2020    B # 2nd: drop
 2 17/02/2020    B # 3rd: drop
 2 18/03/2020    E
 2 19/03/2020    E

推荐答案

df %>% filter(drug != lag(drug, default = ""))

或者,如果您想为一个 ID 保留第一个出现的药物,即使它与前一个 ID 的最后一个药物匹配(例如,假设 ID2 的第一个药物是 A,因此我们希望保留它.):

Or, if you want to keep first appearance of a drug for one ID even if it matches the last drug for the prior ID (e.g. let's say ID2's first drug was A and we therefore wanted to keep it.):

df %>%
  filter(drug != lag(drug, default = "") |
           ID != lag(ID, default = 0))

这篇关于按组选择每个连续运行中的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆