通过组移动窗口来区分计数 [英] Count distinct by group- moving window
问题描述
假设我有一个数据集,其中包含医院就诊次数。我的目标是生成一个变量,该变量计算访问者在访问日期之前见过的唯一患者的数量。我经常与dplyr的group_by一起工作,但这似乎有些棘手。我想我将不得不使用group_by,n_distinct和sum或某种移动窗口命令。我需要目标变量。
Let's say I have a dataset contain visits in a hospital. My goal is to generate a variable that counts the number of unique patients the visitor has seen before at the date of the visit. I often work with group_by by dplyr but this seems a little tricky. I guess I would have to use group_by, n_distinct, and sum or some kind moving window command. The "goal" variable is what I need.
visitor visitdt patient goal
125469 1/12/2018 15200 1
125469 1/19/2018 15200 1
125469 2/16/2018 15200 1
125469 2/23/2018 52607 2
125469 3/9/2018 52607 2
125469 3/16/2018 52607 2
125469 3/23/2018 15200 2
125469 3/29/2018 15200 2
125469 3/30/2018 20589 3
125469 4/6/2018 20589 3
谢谢,
Marvin
Thanks, Marvin
推荐答案
您可以执行以下操作:
with(df, ave(patient, visitor, FUN = function(x) cumsum(!duplicated(x))))
[1] 1 1 1 2 2 2 2 2 3 3
本质上,它是每个组中非重复值的累积和。
Essentially, it is a cumulative sum of non-duplicated values per group.
您也可以执行相同操作与 dplyr
:
And you can also do the same with dplyr
:
df %>%
group_by(visitor) %>%
mutate(res = cumsum(!duplicated(patient)))
这篇关于通过组移动窗口来区分计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!