通过组移动窗口来区分计数 [英] Count distinct by group- moving window

查看:85
本文介绍了通过组移动窗口来区分计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个数据集,其中包含医院就诊次数。我的目标是生成一个变量,该变量计算访问者在访问日期之前见过的唯一患者的数量。我经常与dplyr的group_by一起工作,但这似乎有些棘手。我想我将不得不使用group_by,n_distinct和sum或某种移动窗口命令。我需要目标变量。

Let's say I have a dataset contain visits in a hospital. My goal is to generate a variable that counts the number of unique patients the visitor has seen before at the date of the visit. I often work with group_by by dplyr but this seems a little tricky. I guess I would have to use group_by, n_distinct, and sum or some kind moving window command. The "goal" variable is what I need.

visitor visitdt patient goal
125469  1/12/2018   15200   1
125469  1/19/2018   15200   1
125469  2/16/2018   15200   1
125469  2/23/2018   52607   2
125469  3/9/2018    52607   2
125469  3/16/2018   52607   2
125469  3/23/2018   15200   2
125469  3/29/2018   15200   2
125469  3/30/2018   20589   3
125469  4/6/2018    20589   3

谢谢,
Marvin

Thanks, Marvin

推荐答案

您可以执行以下操作:

with(df, ave(patient, visitor, FUN = function(x) cumsum(!duplicated(x))))

 [1] 1 1 1 2 2 2 2 2 3 3

本质上,它是每个组中非重复值的累积和。

Essentially, it is a cumulative sum of non-duplicated values per group.

您也可以执行相同操作与 dplyr

And you can also do the same with dplyr:

df %>%
 group_by(visitor) %>%
 mutate(res = cumsum(!duplicated(patient)))

这篇关于通过组移动窗口来区分计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆