在特定时间段内根据事件日期创建虚拟变量 [英] creating dummy variable based on event date within certain time-period
问题描述
我有一个像这样的数据集 products
:
I have a dataset products
like so:
> head(featured_products)
Dept Class Sku Description Code Vehicle/Placement StartDate EndDate Comments(Circulation,Location,etc)
1: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Facebook 2017-01-01 2017-01-29 Fancy Brunch Blog
2: 430 4318 401684 ++INDV RAMEKIN WP 9CM OSM Twitter 2017-01-01 2017-01-29 Fancy Brunch Blog
3: 340 3411 1672605 ++ SPHERE WILLOW 4" OP1 Editorial 2016-02-29 2016-03-27 Spruce up for Spring
4: 230 2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS EM Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5: 895 8957 2118072 ++PAPASAN STL TAUPE OSM Instagram 2017-08-26 2017-10-01 by @audriestorme
6: 895 8957 2118072 ++PAPASAN STL TAUPE EM Email 2017-11-23 2017-11-23 Day 2 Black Friday AM
和另一个数据集 sales
,如下所示:
and another dataset sales
like so:
SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent
1: 401684 2015-12-01 150 0.00 406 2.72
2: 401684 2015-12-02 0 0.00 556 3.79
3: 401684 2015-12-03 0 0.00 723 3.44
4: 401684 2015-12-04 16 4.91 781 2.46
5: 401684 2015-12-05 17 0.00 982 3.18
6: 401684 2015-12-06 0 0.00 851 3.12
现在...如何在 sales
数据集中创建标题为"featured"的标志列?如果 ActivityDate
介于 products
(StartDate,EndDate)中列出的时间之间,则该值为1,否则为0 ...
Now...how can I create a flag column in the sales
dataset entitled "featured"? This should be 1 if the ActivityDate
falls between the times listed in products
(StartDate, EndDate) and 0 otherwise...
我尝试过一些建议的帖子,它们介绍了在 POSIXct
时间之外创建间隔的方法,但是它们似乎都不符合我的需求.
I have tried several suggested posts on creating intervals out of POSIXct
times but none of them really seem to suit my needs.
建议会很好.谢谢你.
推荐答案
这可以使用非装备联接来解决:
library(data.table)
setDT(sales)[, featured := 0][setDT(featured_products),
on = .(SKU, ActivityDate >= StartDate, ActivityDate <= EndDate),
featured := 1][]
SKU ActivityDate featured
1: 401684 2017-01-01 1
2: 401684 2016-03-15 0
3: 1672605 2016-03-22 1
4: 1672605 2017-01-15 0
确保所有与 non-equi join 有关的列,即 ActivityDate
, StartDate
和 EndDate
,具有相同的类型/类别,即 POSIXct
或 Date
或 IDate
,最好是 Date
一天的时间无关紧要.
Make sure that all columns involved in the non-equi join, i.e., ActivityDate
, StartDate
, and EndDate
, are of the same type/class, either POSIXct
or Date
or IDate
, preferably Date
if time of day is not relevant.
featured_products <- data.frame(
SKU = c(401684, 1672605),
StartDate = as.POSIXct(c("2017-01-01", "2016-02-29")),
EndDate = as.POSIXct(c("2017-01-29", "2016-03-27")))
sales <- data.frame(
SKU = c(401684, 401684, 1672605, 1672605),
ActivityDate = as.POSIXct(c("2017-01-01", "2016-03-15", "2016-03-22", "2017-01-15")))
请注意,日期是OP要求的 POSIXct
类.
Note that the dates are of class POSIXct
as requested by the OP.
这篇关于在特定时间段内根据事件日期创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!