条件R的计数/重复 [英] Counting/repeating with conditions R

查看:96
本文介绍了条件R的计数/重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我之前问过的问题开始(以R数据帧中的条件计数),我有下表:

From a question I asked before (Count with conditions in R dataframe), I have the following table:

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           0
     5     111      11            2           2
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我希望 LastDiscount 计数在第一行中,同一周的同一SKU在不同星期有不同的折扣。例如,SKU 111在第2周有折扣,下一个折扣在第4周,从上次折扣起算有2周,但是问题是我希望结果在第4周开始下一个折扣折扣活动。

I want the LastDiscount count to be in the first row where there is a different discount for the same SKU in different weeks. For example, the SKU 111 had a discount in the 2nd week and the next discount is in the 4th week, that gives 2 weeks since the last discount but the problem is that I want the result to be in the 4th week where starts the next discount campaign.

类似这样的东西:

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           2
     5     111      11            2           0
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我现在有此代码:

df1 %>%
  group_by(SKU) %>% 
  mutate(Duration = with(rle(Discount > 0), rep(lengths*values, 
        lengths)),
         temp = with(rle(Discount > 0), sum(values != 0)), 
         LastDiscount = if(temp[1] > 1) c(rep(0, n()-1), temp[1]) else 0) %>%
  select(-temp)


推荐答案

以下是使用的选项data.table 。如果OP仅在寻找 dplyr 解决方案,我将删除它:

Here is an option using data.table. I will delete it if OP is only looking for a dplyr solution:

#calculate duration of discount and also the start and end of discount period
DT[, c("Duration", "disc_seq") := {
        dur <- sum(`Discount(%)` > 0L)
        disc_seq <- rep("", .N)
        if (dur > 0) {
            disc_seq[1L] <- "S"
            disc_seq[length(disc_seq)] <- "E"
        }
        .(dur, disc_seq)
    }, 
    .(SKU, rleid(`Discount(%)` > 0L))]
DT[]

#use a non-equi join to find the end of previous discount period to update LastDiscount column of the start of current discount period
DT[, LastDiscount := 0L]
DT[disc_seq=="S", LastDiscount := {
        ld <- DT[disc_seq=="E"][.SD, on=.(SKU, Week<Week), by=.EACHI, i.Week - x.Week]$V1
        replace(ld, is.na(ld), 0L)
    }]
DT[]

输出:

   Week SKU Discount(%) Duration disc_seq LastDiscount
1:    1 111           5        2        S            0
2:    2 111           5        2        E            0
3:    3 111           0        0                     0
4:    4 111          10        2        S            2
5:    5 111          11        2        E            0
6:    1 222           0        0                     0
7:    2 222          10        3        S            0
8:    3 222          15        3                     0
9:    4 222          20        3        E            0

数据:

library(data.table)
DT <- fread("Week   SKU   Discount(%)
1     111       5
2     111       5
3     111       0
4     111      10
5     111      11
1     222       0
2     222      10
3     222      15
4     222      20")

这篇关于条件R的计数/重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆