计算特定时间段内的出现次数 [英] Calculate number of occurrences within a specific time period

查看:114
本文介绍了计算特定时间段内的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据,其中ID代表个人,Date代表日期,而Purchased代表是否有人购买了商品(我进行了最后一次购买,以便我可以计算发生的次数):

 身份证购买日期1 1 2017年1月1日12 1 2017-08-03 13 1 2017-09-02 14 2 2017-09-04 15 2 2018-07-12 16 2 2018-11-03 17 2 2018-12-05 18 2 2019-01-01 19 3 2018-02-03 110 3 2020-02-03 111 3 2020-03-01 1 

我想创建一个名为"Frequency"的变量.通过对所有已购买"商品进行汇总,可以计算出该个人在过去一年中进行购买的次数.在数据框中看到的特定日期之前.

因此,例如,对于第3行,这将导致频率"变化.自 2017-01-01 2017-08-03 均在距 2017-09-02 的一年内(因此在 2016-09-02 2017-09-01 的间隔内).
查看所需的输出:

  ID购买日期频率1 1 2017年1月1日1 02 1 2017-08-03 1 13 1 2017-09-02 1 24 2 2017-09-04 1 05 2 2018-07-12 1 16 2 2018-11-03 1 17 2 2018-12-05 1 28 2 2019-01-01 1 39 3 2018-02-03 1 010 3 2020-02-03 1 011 3 2020-03-01 1 1 

要重现数据帧:

  df<-data.frame(ID = c(1,1,1,2,2,2,2,2,2,3,3,3),Date = as.Date(c('2017-01-01','2017-08-03','2017-09-02','2017-09-04','2018-07-12','2018-11-03','2018-12-05','2019-01-01','2018-02-03','2020-02-03','2020-03-01')),已购买= c(1、1、1、1,1,1,1,1,1,1,1)) 

我已经在stackoverlow上进行了搜索,但是还没有找到能够适用于我的情况并获得所需结果的答案.我发现并尝试过的一件事是:

  df $ frequency<-sapply(df $ Date,function(x){总和(df $ Date< x& df $ Date> = x-365)}) 

我相信,如果我可以找到一种方法来按ID将其分组(因此,每个ID的总和而不是总和),这可能会给我带来我想要的结果.当然不能肯定地说,因为我还不能进行测试.非常感谢您的帮助.

解决方案

这是一个 tidyverse 解决方案:

 库(dplyr)图书馆(purrr)库(润滑)df%>%group_by(ID)%&%;%mutate(频率= map_dbl(日期,〜sum(Purchased [between(Date,.x-years(1),.x-1)])))%>%取消分组#ID购买日期频率#< dbl>< date>< dbl>< dbl>#1 1 2017-01-01 1 0#2 1 2017-08-03 1 1#3 1 2017-09-02 1 2#4 2 2017-09-04 1 0#5 2 2018-07-12 1 1#6 2 2018-11-03 1 1#7 2 2018-12-05 1 2#8 2 2019-01-01 1 3#9 3 2018-02-03 1 0#10 3 2020-02-03 1 0#11 3 2020-03-01 1 1 

代码的逻辑是针对每个 ID 中的每个 Date ,它的 sum 是介于 Purchased 之间的值当前日期-1年,当前日期-1天.

I have the folllowing data, where ID stands for an individual, Date for the date, and Purchased for whether somebody made a purchase (I made this last one so that I can count the the occurences):

   ID       Date Purchased
1   1 2017-01-01         1
2   1 2017-08-03         1
3   1 2017-09-02         1
4   2 2017-09-04         1
5   2 2018-07-12         1
6   2 2018-11-03         1
7   2 2018-12-05         1
8   2 2019-01-01         1
9   3 2018-02-03         1
10  3 2020-02-03         1
11  3 2020-03-01         1

I would like to create a variable called "Frequency" that calculates the number of times an individual has made a purchase in the past year by summing up all the "Purchased" before the specific Date you see in the data frame.

So for example, for row 3 this would lead to a "Frequency" of 2 since 2017-01-01 and 2017-08-03 are both within a one-year time period from 2017-09-02 (so within the interval of 2016-09-02 and 2017-09-01).
See desired output:

   ID       Date Purchased Frequency
1   1 2017-01-01         1         0
2   1 2017-08-03         1         1
3   1 2017-09-02         1         2
4   2 2017-09-04         1         0
5   2 2018-07-12         1         1
6   2 2018-11-03         1         1
7   2 2018-12-05         1         2
8   2 2019-01-01         1         3
9   3 2018-02-03         1         0
10  3 2020-02-03         1         0
11  3 2020-03-01         1         1

To reproduce the dataframe:

df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3), Date = as.Date(c('2017-01-01', '2017-08-03', '2017-09-02', '2017-09-04', '2018-07-12', '2018-11-03', '2018-12-05', '2019-01-01', '2018-02-03', '2020-02-03', '2020-03-01')), Purchased = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ))

I've searched on stackoverlow but haven't been able to find an answer yet that I'm able to apply to my situation and obtain the desired results. One of the things that I found and tried was this:

df$frequency <-
sapply(df$Date, function(x){
sum(df$Date < x & df$Date >= x - 365)
})

I believe this might give me the results I want if I can find a way to include that it groups by ID (so it sums per ID and not overall). Can't say for sure of course since I haven't been able to test it out. Any help is much appreciated.

解决方案

Here's a tidyverse solution :

library(dplyr)
library(purrr)
library(lubridate)

df %>%
  group_by(ID) %>%
  mutate(Frequency = map_dbl(Date, 
                     ~sum(Purchased[between(Date, .x - years(1), .x - 1)]))) %>%
  ungroup

#      ID Date       Purchased Frequency
#   <dbl> <date>         <dbl>     <dbl>
# 1     1 2017-01-01         1         0
# 2     1 2017-08-03         1         1
# 3     1 2017-09-02         1         2
# 4     2 2017-09-04         1         0
# 5     2 2018-07-12         1         1
# 6     2 2018-11-03         1         1
# 7     2 2018-12-05         1         2
# 8     2 2019-01-01         1         3
# 9     3 2018-02-03         1         0
#10     3 2020-02-03         1         0
#11     3 2020-03-01         1         1

The logic of the code is for every Date in each ID it sums the Purchased value between current date - 1 year and current date - 1 day.

这篇关于计算特定时间段内的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆