在特定时间段内根据事件日期创建虚拟变量 [英] creating dummy variable based on event date within certain time-period

查看:91
本文介绍了在特定时间段内根据事件日期创建虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据集 products :

I have a dataset products like so:

> head(featured_products)
   Dept Class     Sku                    Description Code Vehicle/Placement  StartDate    EndDate  Comments(Circulation,Location,etc)
1:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM          Facebook 2017-01-01 2017-01-29                   Fancy Brunch Blog
2:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM           Twitter 2017-01-01 2017-01-29                   Fancy Brunch Blog
3:  340  3411 1672605            ++ SPHERE WILLOW 4"  OP1         Editorial 2016-02-29 2016-03-27                Spruce up for Spring
4:  230  2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS   EM             Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5:  895  8957 2118072            ++PAPASAN STL TAUPE  OSM         Instagram 2017-08-26 2017-10-01                    by @audriestorme
6:  895  8957 2118072            ++PAPASAN STL TAUPE   EM             Email 2017-11-23 2017-11-23               Day 2 Black Friday AM

和另一个数据集 sales ,如下所示:

and another dataset sales like so:

      SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent
1: 401684   2015-12-01                 150                  0.00                  406                   2.72
2: 401684   2015-12-02                   0                  0.00                  556                   3.79
3: 401684   2015-12-03                   0                  0.00                  723                   3.44
4: 401684   2015-12-04                  16                  4.91                  781                   2.46
5: 401684   2015-12-05                  17                  0.00                  982                   3.18
6: 401684   2015-12-06                   0                  0.00                  851                   3.12

现在...如何在 sales 数据集中创建标题为"featured"的标志列?如果 ActivityDate 介于 products (StartDate,EndDate)中列出的时间之间,则该值为1,否则为0 ...

Now...how can I create a flag column in the sales dataset entitled "featured"? This should be 1 if the ActivityDate falls between the times listed in products(StartDate, EndDate) and 0 otherwise...

我尝试过一些建议的帖子,它们介绍了在 POSIXct 时间之外创建间隔的方法,但是它们似乎都不符合我的需求.

I have tried several suggested posts on creating intervals out of POSIXct times but none of them really seem to suit my needs.

建议会很好.谢谢你.

推荐答案

这可以使用非装备联接来解决:

library(data.table)
setDT(sales)[, featured := 0][setDT(featured_products), 
             on = .(SKU, ActivityDate >= StartDate, ActivityDate <= EndDate), 
             featured := 1][]

       SKU ActivityDate featured
1:  401684   2017-01-01        1
2:  401684   2016-03-15        0
3: 1672605   2016-03-22        1
4: 1672605   2017-01-15        0

确保所有与 non-equi join 有关的列,即 ActivityDate StartDate EndDate ,具有相同的类型/类别,即 POSIXct Date IDate ,最好是 Date 一天的时间无关紧要.

Make sure that all columns involved in the non-equi join, i.e., ActivityDate, StartDate, and EndDate, are of the same type/class, either POSIXct or Date or IDate, preferably Date if time of day is not relevant.

featured_products <- data.frame(
  SKU = c(401684, 1672605), 
  StartDate = as.POSIXct(c("2017-01-01", "2016-02-29")), 
  EndDate = as.POSIXct(c("2017-01-29", "2016-03-27")))
sales <- data.frame(
  SKU = c(401684, 401684, 1672605, 1672605), 
  ActivityDate = as.POSIXct(c("2017-01-01", "2016-03-15", "2016-03-22", "2017-01-15")))

请注意,日期是OP要求的 POSIXct 类.

Note that the dates are of class POSIXct as requested by the OP.

这篇关于在特定时间段内根据事件日期创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆