按组选择每个连续运行中的第一行 [英] Select first row in each contiguous run by group

查看：45 发布时间：2021/7/19 18:44:33 r dplyr group-by sequence

本文介绍了按组选择每个连续运行中的第一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有按ID"分组的数据.每个ID"在不同的日期都有不同的药物.在每次连续运行的药物"中，我只想保留第一行.这应该按组完成，即在每个ID"内.数据中有两个例子:

I have data which is grouped by 'ID'. Each 'ID' has different drugs at different dates. Within each consecutive run of 'drug', I would like to keep only the first row. This should be done by group, i.e. within each 'ID'. Two examples are shown in the data:

ID        date    drug  
 1  01/01/2020       A # first row in run 1 of 'A' for ID 1: keep 
 1  07/01/2020       A # 2nd row in run 1 of 'A' for ID 1: drop
 1  09/01/2020       B
 1  15/01/2020       A
 2  01/02/2020       C 
 2  13/02/2020       D
 2  17/02/2020       C # first row in run 2 of 'C' of ID 2: keep 
 2  18/03/2020       C # 2nd row in run 2 of 'C' of ID 2: drop 
 2  19/03/2020       E

所需的输出:

ID     date             drug  
1      01/01/2020        A
1      09/01/2020        B
1      15/01/2020        A
2      01/02/2020        C
2      13/02/2020        D
2      17/02/2020        C
2      19/03/2020        E

我已经尝试了以下方法，但我无法使它起作用，因为它会移除那些来自同一组但稍后出现的药物，例如它会在 15/01/2020、17/02/2020 和 18/03/2020 下降，因为它只按组进行第一次观察.

I have tried the following but I cannot make it work since it'll remove those drugs that are from the same group but appear later on e.g. it'd drop 15/01/2020, 17/02/2020 and 18/03/2020 since it only takes the first observation by group.

df_selection <- df %>%   
  group_by(ID) %>% 
  arrange(ID,date) %>% 
  group_by(ID, drug) %>% 
  slice(1L) %>% 
  arrange(ID,date)

我尝试了很多组合，但我无法让它发挥作用.我真的很感激一些帮助！

I have tried many combinations but I cannot make it work. I'd really appreciate some help!

另外一个例子来证明一个ID"中的最后一个药物"与下一个ID"中的第一个相同，这里是药物B":

An additional example to demonstrate a case where the last 'drug' in one 'ID' is the same as the first in the next 'ID', here drug 'B':

ID       date drug
 1 01/01/2020    A
 1 07/01/2020    A
 1 09/01/2020    B # first row in a run of 'B' for ID 1: keep 
 1 15/01/2020    B # 2nd row in a run of 'B' for ID 1: drop 
 2 01/02/2020    B # first row in a run of 'B' for ID 2: keep 
 2 13/02/2020    B # 2nd: drop
 2 17/02/2020    B # 3rd: drop
 2 18/03/2020    E
 2 19/03/2020    E

推荐答案

df %>% filter(drug != lag(drug, default = ""))

或者，如果您想为一个 ID 保留第一个出现的药物，即使它与前一个 ID 的最后一个药物匹配(例如，假设 ID2 的第一个药物是 A，因此我们希望保留它.):

Or, if you want to keep first appearance of a drug for one ID even if it matches the last drug for the prior ID (e.g. let's say ID2's first drug was A and we therefore wanted to keep it.):

df %>%
  filter(drug != lag(drug, default = "") |
           ID != lag(ID, default = 0))

这篇关于按组选择每个连续运行中的第一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按组选择每个连续运行中的第一行 [英] Select first row in each contiguous run by group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按组选择每个连续运行中的第一行 [英] Select first row in each contiguous run by group

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭