按组选择每个连续运行中的第一行 [英] Select first row in each contiguous run by group
问题描述
我有按ID"分组的数据.每个ID"在不同的日期都有不同的药物.在每次连续运行的药物"中,我只想保留第一行.这应该按组完成,即在每个ID"内.数据中有两个例子:
I have data which is grouped by 'ID'. Each 'ID' has different drugs at different dates. Within each consecutive run of 'drug', I would like to keep only the first row. This should be done by group, i.e. within each 'ID'. Two examples are shown in the data:
ID date drug
1 01/01/2020 A # first row in run 1 of 'A' for ID 1: keep
1 07/01/2020 A # 2nd row in run 1 of 'A' for ID 1: drop
1 09/01/2020 B
1 15/01/2020 A
2 01/02/2020 C
2 13/02/2020 D
2 17/02/2020 C # first row in run 2 of 'C' of ID 2: keep
2 18/03/2020 C # 2nd row in run 2 of 'C' of ID 2: drop
2 19/03/2020 E
所需的输出:
ID date drug
1 01/01/2020 A
1 09/01/2020 B
1 15/01/2020 A
2 01/02/2020 C
2 13/02/2020 D
2 17/02/2020 C
2 19/03/2020 E
我已经尝试了以下方法,但我无法使它起作用,因为它会移除那些来自同一组但稍后出现的药物,例如它会在 15/01/2020、17/02/2020 和 18/03/2020 下降,因为它只按组进行第一次观察.
I have tried the following but I cannot make it work since it'll remove those drugs that are from the same group but appear later on e.g. it'd drop 15/01/2020, 17/02/2020 and 18/03/2020 since it only takes the first observation by group.
df_selection <- df %>%
group_by(ID) %>%
arrange(ID,date) %>%
group_by(ID, drug) %>%
slice(1L) %>%
arrange(ID,date)
我尝试了很多组合,但我无法让它发挥作用.我真的很感激一些帮助!
I have tried many combinations but I cannot make it work. I'd really appreciate some help!
另外一个例子来证明一个ID"中的最后一个药物"与下一个ID"中的第一个相同,这里是药物B":
An additional example to demonstrate a case where the last 'drug' in one 'ID' is the same as the first in the next 'ID', here drug 'B':
ID date drug
1 01/01/2020 A
1 07/01/2020 A
1 09/01/2020 B # first row in a run of 'B' for ID 1: keep
1 15/01/2020 B # 2nd row in a run of 'B' for ID 1: drop
2 01/02/2020 B # first row in a run of 'B' for ID 2: keep
2 13/02/2020 B # 2nd: drop
2 17/02/2020 B # 3rd: drop
2 18/03/2020 E
2 19/03/2020 E
推荐答案
df %>% filter(drug != lag(drug, default = ""))
或者,如果您想为一个 ID 保留第一个出现的药物,即使它与前一个 ID 的最后一个药物匹配(例如,假设 ID2 的第一个药物是 A,因此我们希望保留它.):>
Or, if you want to keep first appearance of a drug for one ID even if it matches the last drug for the prior ID (e.g. let's say ID2's first drug was A and we therefore wanted to keep it.):
df %>%
filter(drug != lag(drug, default = "") |
ID != lag(ID, default = 0))
这篇关于按组选择每个连续运行中的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!