从最近21天的窗口提取事件类型 [英] Extracting event types from last 21 day window

查看:91
本文介绍了从最近21天的窗口提取事件类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框看起来像这样。两个最右边的列是我所需的列。

  **名称ActivityType ActivityDate电子邮件(最近21日)网络研讨会(last21)** 
John电子邮件1/1/2014 NA NA
John网络研讨会1/5/2014 NA NA
约翰销售1/20/2014是是
John网络研讨会3/25/2014 NA NA
约翰出售4/1/2014否是
约翰出售7/1/2014否否
汤姆电子邮件1/1/2015 NA NA
汤姆网路研讨会1/5 / 2015 NA NA
汤姆出售1/20/2015是是
汤姆网络研讨会3/25/2015 NA NA
汤姆出售4/1/2015否是
汤姆销售7/1/2015否否

我只是想创建一个表示是否在过去21天内为每个销售交易发送电子邮件或网络研讨会。我以这种方式使用dplyr的想法(模拟代码):

  custlife%>%
group_by (姓名)%>%
mutate(电子邮件(last21days)=滞后(ifelse(ActivityType =Email& $ b

我不知道如何实现这一点,请帮助,您的帮助真诚感谢!

解决方案

这是一个可能的 data.table 解决方案。数据集 - 一个用于销售,另一个用于其余的活动类型,然后通过滚动窗口21进行加入,同时使用 by = .EACHI 为了检查每个连接中的条件,然后我将结果加入原始数据集。



将日期列转换为日期类并按名称和日期键入数据(最终/滚动加入)

 code> library(data.table)
setk ey(setDT(df)[,ActivityDate:= as.IDate(ActivityDate,%m /%d /%Y)],Name,ActivityDate)

每个活动创建2个临时数据集

  Saletemp<  -  df [ActivityType ==Sale,。(Name,ActivityDate)] 
Elsetemp< - df [ActivityType!=Sale,。(Name,ActivityDate,ActivityType)]

检查条件时,通过滚动窗口加入销售临时数据集

  Saletemp [Elsetemp,`:=`(Email21 = as.logical(which(i.ActivityType ==Email)),
Webinar21 = as.logical(which i.ActivityType ==Webinar))),
roll = -21,by = .EACHI]

加入所有内容

  df [Saletemp,`:=`(Email21 = i.Email21,Webinar21 = i .Webinar21)] 
df
#名称活动类型活动日期Email21网络研讨会21
#1:约翰电子邮件2014-01-01 NA NA
#2:约翰网络研讨会2014-01-05 NA NA
#3:John Sale 2014-01-20 TRUE TRUE
#4:John Webinar 2014-03-25 NA NA
#5:John Sale 2014 -04-01 NA TRUE
#6:John Sale 2014-07-01 NA NA
#7:Tom Email 2015-01-01 NA NA
#8:Tom Webinar 2015-01 -05 NA NA
#9:Tom Sale 2015-01-20 TRUE TRUE
#10:Tom Webinar 2015-03-25 NA NA
#11:Tom Sale 2015-04-01 NA TRUE
#12:Tom Sale 2015-07-01 NA NA


My dataframe looks like this. The two rightmost columns are my desired columns.

**Name      ActivityType     ActivityDate   Email(last 21 says)  Webinar(last21)**             
John       Email            1/1/2014        NA                   NA   
John       Webinar          1/5/2014        NA                   NA
John       Sale             1/20/2014       Yes                  Yes
John       Webinar          3/25/2014       NA                   NA
John       Sale             4/1/2014        No                   Yes
John       Sale             7/1/2014        No                   No
Tom        Email            1/1/2015        NA                   NA   
Tom        Webinar          1/5/2015        NA                   NA
Tom        Sale             1/20/2015       Yes                  Yes
Tom        Webinar          3/25/2015       NA                   NA
Tom        Sale              4/1/2015        No                   Yes
Tom        Sale              7/1/2015        No                   No                

I am just trying to create a yes/no variable that denotes whether there was an email or a webinar in the last 21 days for each "Sale" transaction. I was thinking(mock code) along the lines of using dplyr this way:

custlife %>% 
group_by(Name) %>% 
 mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)).

I am not sure of the way to implement this. Kindly help. Your help is sincerely appreciated!

解决方案

Here's a possible data.table solution. Here I'm creating 2 temporary data sets- one for Sale and one for the rest of activity types and then joining between them by a rolling window of 21 while using by = .EACHI in order to check conditions in each join. Then, I'm joining the result to the original data set.

Convert the date column to Date class and key the data by Name and Date (for the final/rolling join)

library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)

Create 2 temporary data sets per each activity

Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]

Join by a rolling window of 21 to the sales temporary data set while checking conditions

Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), 
                        Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), 
         roll = -21, by = .EACHI]

Join everything back

df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
#     Name ActivityType ActivityDate Email21 Webinar21
#  1: John        Email   2014-01-01      NA        NA
#  2: John      Webinar   2014-01-05      NA        NA
#  3: John         Sale   2014-01-20    TRUE      TRUE
#  4: John      Webinar   2014-03-25      NA        NA
#  5: John         Sale   2014-04-01      NA      TRUE
#  6: John         Sale   2014-07-01      NA        NA
#  7:  Tom        Email   2015-01-01      NA        NA
#  8:  Tom      Webinar   2015-01-05      NA        NA
#  9:  Tom         Sale   2015-01-20    TRUE      TRUE
# 10:  Tom      Webinar   2015-03-25      NA        NA
# 11:  Tom         Sale   2015-04-01      NA      TRUE
# 12:  Tom         Sale   2015-07-01      NA        NA

这篇关于从最近21天的窗口提取事件类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆