按组移动窗口不同计数 [英] Count distinct by group- moving window

查看:19
本文介绍了按组移动窗口不同计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含在医院就诊的数据集.我的目标是生成一个变量来计算访问者在访问日期之前见过的唯一患者的数量.我经常与 dplyr 的 group_by 一起工作,但这似乎有点棘手.我想我必须使用 group_by、n_distinct 和 sum 或某种移动窗口命令.目标"变量是我需要的.

Let's say I have a dataset contain visits in a hospital. My goal is to generate a variable that counts the number of unique patients the visitor has seen before at the date of the visit. I often work with group_by by dplyr but this seems a little tricky. I guess I would have to use group_by, n_distinct, and sum or some kind moving window command. The "goal" variable is what I need.

visitor visitdt patient goal
125469  1/12/2018   15200   1
125469  1/19/2018   15200   1
125469  2/16/2018   15200   1
125469  2/23/2018   52607   2
125469  3/9/2018    52607   2
125469  3/16/2018   52607   2
125469  3/23/2018   15200   2
125469  3/29/2018   15200   2
125469  3/30/2018   20589   3
125469  4/6/2018    20589   3

谢谢,马文

推荐答案

你可以这样做:

with(df, ave(patient, visitor, FUN = function(x) cumsum(!duplicated(x))))

 [1] 1 1 1 2 2 2 2 2 3 3

本质上,它是每组非重复值的累积总和.

Essentially, it is a cumulative sum of non-duplicated values per group.

你也可以用 dplyr 做同样的事情:

And you can also do the same with dplyr:

df %>%
 group_by(visitor) %>%
 mutate(res = cumsum(!duplicated(patient)))

这篇关于按组移动窗口不同计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆