在R的四分之一中未购买后将0 [英] put a 0 after not buying in a quarter in r

查看:72
本文介绍了在R的四分之一中未购买后将0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个变量ID,月份(或期间)和该月的收入的基数。我需要的是,如果客户在接下来的3个月内购买,则输入1;否则,则输入0,然后针对所有ID进行输入。
例如,如果我在第1个月内购物,并且在接下来的3个月内有购买交易,则在该行中为该客户放置1。
在最近的3个月内,将出现NA。

I have a base with the variables ID, month (or period) and the incomes of that month. What I need is to put a 1 if the client buys in the next 3 months or a 0 if not, and do it for all ID. For example, if I am in month 1 and there's a purchase in the next 3 months, then put a 1 in that row for that client. In the last periods as there will not be 3 months, an NA appears.

 df<-tibble::tribble(
  ~ID, ~Month, ~Incomes,
       1L,     1L,    5000L,
       1L,     2L,       0L,
       1L,     3L,       0L,
       1L,     4L,       0L,
       1L,     5L,       0L,
       1L,     6L,       0L,
       1L,     7L,     400L,
       1L,     8L,     300L,
       1L,     9L,       0L,
       1L,    10L,       0L,
       1L,    11L,       0L,
       1L,    12L,       0L,
       1L,    13L,     400L,
       2L,     1L,       0L,
       2L,     2L,     100L,
       2L,     3L,       0L,
       2L,     4L,       0L,
       2L,     5L,       0L,
       2L,     6L,       0L,
       2L,     7L,       0L,
       2L,     8L,    1500L,
       2L,     9L,       0L,
       2L,    10L,       0L,
       2L,    11L,       0L,
       2L,    12L,     100L,
       2L,    13L,     750L,
       3L,     1L,       0L,
       3L,     2L,       0L,
       3L,     3L,       0L,
       3L,     4L,       0L,
       3L,     5L,     700L,
       3L,     6L,     240L,
       3L,     7L,     100L,
       3L,     8L,       0L,
       3L,     9L,       0L,
       3L,    10L,       0L,
       3L,    11L,       0L,
       3L,    12L,     500L,
       3L,    13L,     760L
  )
df<-as.data.frame(df)
#     ID Month Incomes
#      1     1    5000
#      1     2       0
#      1     3       0
#      1     4       0
#      1     5       0
#      1     6       0
#      1     7     400
#      1     8     300
#      1     9       0
#      1    10       0
#      1    11       0
#      1    12       0
#      1    13     400
#      2     1       0
#      2     2     100
#      2     3       0
#      2     4       0
#      2     5       0
#      2     6       0
#      2     7       0
#      2     8    1500
#      2     9       0
#      2    10       0
#      2    11       0
#      2    12     100
#      2    13     750
#      3     1       0
#      3     2       0
#      3     3       0
#      3     4       0
#      3     5     700
#      3     6     240
#      3     7     100
#      3     8       0
#      3     9       0
#      3    10       0
#      3    11       0
#      3    12     500
#      3    13     760

我希望看起来像这样:

dffinal<- tibble::tribble(
  ~ID_RUT, ~Month, ~Incomes, ~Quarter,
       1L,     1L,    5000L,         0L,
       1L,     2L,       0L,         0L,
       1L,     3L,       0L,         0L,
       1L,     4L,       0L,         1L,
       1L,     5L,       0L,         1L,
       1L,     6L,       0L,         1L,
       1L,     7L,     400L,         1L,
       1L,     8L,     300L,         0L,
       1L,     9L,       0L,         0L,
       1L,    10L,       0L,         0L,
       1L,    11L,       0L,         NA,
       1L,    12L,       0L,         NA,
       1L,    13L,     400L,         NA,
       2L,     1L,       0L,         1L,
       2L,     2L,     100L,         0L,
       2L,     3L,       0L,         0L,
       2L,     4L,       0L,         0L,
       2L,     5L,       0L,         1L,
       2L,     6L,       0L,         1L,
       2L,     7L,       0L,         1L,
       2L,     8L,    1500L,         0L,
       2L,     9L,       0L,         1L,
       2L,    10L,       0L,         1L,
       2L,    11L,       0L,         NA,
       2L,    12L,     100L,         NA,
       2L,    13L,     750L,         NA,
       3L,     1L,       0L,         0L,
       3L,     2L,       0L,         1L,
       3L,     3L,       0L,         1L,
       3L,     4L,       0L,         1L,
       3L,     5L,     700L,         1L,
       3L,     6L,     240L,         1L,
       3L,     7L,     100L,         0L,
       3L,     8L,       0L,         0L,
       3L,     9L,       0L,         1L,
       3L,    10L,       0L,         1L,
       3L,    11L,       0L,         NA,
       3L,    12L,     500L,         NA,
       3L,    13L,     760L,         NA
  )

#     ID Month Incomes Quarterly
#      1     1    5000         0
#      1     2       0         0
#      1     3       0         0
#      1     4       0         1
#      1     5       0         1
#      1     6       0         1
#      1     7     400         1
#      1     8     300         0
#      1     9       0         0
#      1    10       0         0
#      1    11       0        NA
#      1    12       0        NA
#      1    13     400        NA
#      2     1       0         1
#      2     2     100         0
#      2     3       0         0
#      2     4       0         0
#      2     5       0         1
#      2     6       0         1
#      2     7       0         1
#      2     8    1500         0
#      2     9       0         1
#      2    10       0         1
#      2    11       0        NA
#      2    12     100        NA
#      2    13     750        NA
#      3     1       0         0
#      3     2       0         1
#      3     3       0         1
#      3     4       0         1
#      3     5     700         1
#      3     6     240         1
#      3     7     100         0
#      3     8       0         0
#      3     9       0         1
#      3    10       0         1
#      3    11       0        NA
#      3    12     500        NA
#      3    13     760        NA

有人吗?感谢您的宝贵时间

Does anyone how to do it? Thanks for your time

推荐答案

1)前滚前滚收入> ; 0 如果为TRUE,则返回TRUE,否则为FALSE。使用 + 将其转换为数字。 1:3表示使用从当前点开始的偏移量1,2,3,即下三个收入。如果要考虑每个收入末尾的下一个和下两个收入,请将 partial = TRUE 参数添加到 rollapply

1) rollapply Roll forward along Incomes > 0 returning TRUE if any are TRUE and FALSE otherwise. Convert that to numeric using +. 1:3 means use offsets 1, 2, 3 from the current point, i.e. the next three incomes. Add the partial=TRUE argument to rollapply if you want to consider the next and next two incomes near the end of each group where there are not three left.

library(dplyr)
library(zoo)

df %>% 
  group_by(ID) %>% 
  mutate(Quarter = +rollapply(Incomes > 0, list(1:3), any, fill = NA)) %>%
  ungroup

2)SQL 一种SQL解决方案会是:

2) SQL An SQL solution would be:

library(sqldf)

over <- "partition by ID rows between 1 following and 3 following"

fn$sqldf("select 
  *, 
  (max(Incomes > 0) over ($over)) + 
    (case when (count(*) over ($over)) = 3 then 0 else Null end) as Quarter
  from df")

如果可以处理少于3行的元素,可以简化此过程。 over 来自上面:

This can be simplified if it is OK to process elements for which there are fewer than 3 rows following. over is from above:

fn$sqldf("select *, (max(Incomes > 0) over ($over)) as Quarter from df")

这篇关于在R的四分之一中未购买后将0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆