计算R中最近x天内ID的出现 [英] Count occurrence of IDs within the last x days in R

查看：20 发布时间：2021/4/28 19:41:33 r dplyr data.table

本文介绍了计算R中最近x天内ID的出现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我关于stackoverflow的第一篇文章，因此，如果我的文章不够详细，请原谅我.

This is my first post on stackoverflow, so please forgive me if my post is not detailed enough.

我有一个包含两列(日期和组ID)的数据表.在当前日期，我想计算最近x天内发生的组出现次数.在下面的示例中，我们可以说过去30天.

I have a data table with two columns (date and group ID). At the current date, I want to count the number of group occurrences that have occurred within the last x days. For my example below, we can say the last 30 days.

date = c("2014-04-01", "2014-04-12", "2014-04-07", "2014-05-03", "2014-04-14", "2014-05-04", "2014-03-31", "2014-04-18", "2014-04-23", "2014-04-01")
group = c("G","G","F","G","E","E","H","H","H","A")
dt = data.table(cbind(group,date))

   group       date
1:     G 2014-04-01
2:     G 2014-04-12
3:     F 2014-04-07
4:     G 2014-05-03
5:     E 2014-04-14              
6:     E 2014-05-04
7:     H 2014-03-31
8:     H 2014-04-18
9:     H 2014-04-23
10:    A 2014-04-01

因此，我想要的新列将如下所示:

So, my desired new column would look like this:

   group       date   count
1:     G 2014-04-01       0
2:     G 2014-04-12       1
3:     F 2014-04-07       0
4:     G 2014-05-03       1 (not including first G since it is outside 30 days)      
5:     E 2014-04-14       0       
6:     E 2014-05-04       1
7:     H 2014-03-31       0
8:     H 2014-04-18       1
9:     H 2014-04-23       2
10:    A 2014-04-01       0

我能够使用dplyr对当前日期的组出现次数进行非窗口计数，但是我正在努力寻找一种进行30天计数的方法.对于非窗口计数，我执行了以下操作:

I was able to use dplyr to perform a non-window count on counting the occurrences of the group at the current date, but I am struggling to find a way to do a 30 day count. For the non-window count, I did the following:

dt = data.table(dt %>%
 group_by(group) %>%
 mutate(count = row_number() - 1))

    group       date count
 1:     G 2014-04-01     0
 2:     G 2014-04-12     1
 3:     F 2014-04-07     0
 4:     G 2014-05-03     2
 5:     E 2014-04-14     0
 6:     E 2014-05-04     1
 7:     H 2014-03-31     0
 8:     H 2014-04-18     1
 9:     H 2014-04-23     2
10:     A 2014-04-01     0

这是数据集的一个小样本，整个数据集包含几百万行，因此我需要高效的东西.任何提示或建议，将不胜感激.预先谢谢你！

This is a small sample of the dataset, where the entire dataset contain a few million rows, so I would need something efficient. Any tips or suggestions would be greatly appreciated. Thank you in advance!

推荐答案

data.table 选项

dt[, date := as.Date(date)][, count := cumsum(date <= first(date) + 30) - 1, group]

给予

> dt
    group       date count
 1:     G 2014-04-01     0
 2:     G 2014-04-12     1
 3:     F 2014-04-07     0
 4:     G 2014-05-03     1
 5:     E 2014-04-14     0
 6:     E 2014-05-04     1
 7:     H 2014-03-31     0
 8:     H 2014-04-18     1
 9:     H 2014-04-23     2
10:     A 2014-04-01     0

遵循类似想法的 dplyr 选项

dt %>%
  mutate(date = as.Date(date)) %>%
  group_by(group) %>%
  mutate(count = cumsum(date <= first(date) + 30) - 1) %>%
  ungroup()

这篇关于计算R中最近x天内ID的出现的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算R中最近x天内ID的出现 [英] Count occurrence of IDs within the last x days in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算R中最近x天内ID的出现 [英] Count occurrence of IDs within the last x days in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭