每组唯一值的累计计数 [英] Cumulative count of unique values per group
本文介绍了每组唯一值的累计计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个df,上面有姓名和某些资格日期.我想创建一个根据时间显示一个人有多少个elig_end_dates的指标.这是我的df:
I have a df with names and some dates of eligibility status. I would like to create an indicator of how many unique elig_end_dates a person has, according to time. here is my df:
names date_of_claim elig_end_date
1 tom 2010-01-01 2010-07-01
2 tom 2010-05-04 2010-07-01
3 tom 2010-06-01 2014-01-01
4 tom 2010-10-10 2014-01-01
5 mary 2010-03-01 2014-06-14
6 mary 2010-05-01 2014-06-14
7 mary 2010-08-01 2014-06-14
8 mary 2010-11-01 2014-06-14
9 mary 2011-01-01 2014-06-14
10 john 2010-03-27 2011-03-01
11 john 2010-07-01 2011-03-01
12 john 2010-11-01 2011-03-01
13 john 2011-02-01 2011-03-01
这是我想要的输出:
names date_of_claim elig_end_date obs
1 tom 2010-01-01 2010-07-01 1
2 tom 2010-05-04 2010-07-01 1
3 tom 2010-06-01 2014-01-01 2
4 tom 2010-10-10 2014-01-01 2
5 mary 2010-03-01 2014-06-14 1
6 mary 2010-05-01 2014-06-14 1
7 mary 2010-08-01 2014-06-14 1
8 mary 2010-11-01 2014-06-14 1
9 mary 2011-01-01 2014-06-14 1
10 john 2010-03-27 2011-03-01 1
11 john 2010-07-01 2011-03-01 1
12 john 2010-11-01 2011-03-01 1
13 john 2011-02-01 2011-03-01 1
我发现这篇帖子很有用 R:按类别计算唯一值,但是答案是作为一个单独的表给出的,而不是包含在df中.
I found this post useful R: Count unique values by category, but the answers are given as a seperate table as opposed to being included in the df.
我也尝试过这个:
df$ob = ave(df$elig_end_date, df$elig_end_date, FUN=seq_along)
但这会产生一个计数,我真的只想要一个指标.
But this creates a count, and I really just want an indicator.
提前谢谢
斯蒂芬代码的产品(这不是正确的代码-只是作为学习要点发布)
names date_of_claim elig_end_date ob
1 tom 2010-01-01 2010-07-01 2
2 tom 2010-05-04 2010-07-01 2
3 tom 2010-06-01 2014-01-01 2
4 tom 2010-10-10 2014-01-01 2
5 mary 2010-03-01 2014-06-14 5
6 mary 2010-05-01 2014-06-14 5
7 mary 2010-08-01 2014-06-14 5
8 mary 2010-11-01 2014-06-14 5
9 mary 2011-01-01 2014-06-14 5
10 john 2010-03-27 2011-03-01 4
11 john 2010-07-01 2011-03-01 4
12 john 2010-11-01 2011-03-01 4
13 john 2011-02-01 2011-03-01 4
推荐答案
使用ave
的另一种可能性:
df$obs <- with(df, ave(elig_end_date, names,
FUN = function(x) cumsum(!duplicated(x))))
# names date_of_claim elig_end_date obs
# 1 tom 2010-01-01 2010-07-01 1
# 2 tom 2010-05-04 2010-07-01 1
# 3 tom 2010-06-01 2014-01-01 2
# 4 tom 2010-10-10 2014-01-01 2
# 5 mary 2010-03-01 2014-06-14 1
# 6 mary 2010-05-01 2014-06-14 1
# 7 mary 2010-08-01 2014-06-14 1
# 8 mary 2010-11-01 2014-06-14 1
# 9 mary 2011-01-01 2014-06-14 1
# 10 john 2010-03-27 2011-03-01 1
# 11 john 2010-07-01 2011-03-01 1
# 12 john 2010-11-01 2011-03-01 1
# 13 john 2011-02-01 2011-03-01 1
这篇关于每组唯一值的累计计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文