用R数据框中的条件进行计数 [英] Count with conditions in R dataframe

查看:367
本文介绍了用R数据框中的条件进行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下DF:

    Week   SKU   Discount(%)
     1     111       5
     2     111       5
     3     111       0
     4     111      10
     1     222       0
     2     222      10
     3     222      15
     4     222      20
     1     333       5
     2     333       0
     3     333       0

我想拥有结果:

    Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            1           2
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0
     1     333       5            1           0
     2     333       0            0           0
     3     333       0            0           0

持续时间是1个SKU连续打折的周数。
LastDiscount会计算自SKU上次获得连续折扣以来的周数,仅当折扣之间的周数为0时。

Duration is the number of weeks that 1 SKU had discounts continuously. LastDiscount counts the number of weeks from the last time the SKU was on a continuous discount, only if there are weeks with 0 in between discounts.

推荐答案

一个检查持续时间的选项是按 SKU分组后,在逻辑向量上使用 rle (游程长度编码) ,获取长度和'values'以及 rep 表示这些持续时间。同样, LastDiscount可以通过取得逻辑值的

One option to check the "Duration' is after grouping by 'SKU', use rle (run-length-encoding) on a logical vector, gets the lengths and 'values' and replicate those duration. Similarly, the "LastDiscount" can be obtained by getting the sum of logical values

library(dplyr)
df1 %>%
  group_by(SKU) %>% 
  mutate(Duration = with(rle(Discount > 0), rep(lengths*values, 
        lengths)),
         temp = with(rle(Discount > 0), sum(values != 0)), 
         LastDiscount = if(temp[1] > 1) c(rep(0, n()-1), temp[1]) else 0) %>%
  select(-temp)
# A tibble: 11 x 5
# Groups:   SKU [3]
#    Week   SKU Discount Duration LastDiscount
#   <int> <int>    <int>    <int>        <dbl>
# 1     1   111        5        2            0
# 2     2   111        5        2            0
# 3     3   111        0        0            0
# 4     4   111       10        1            2
# 5     1   222        0        0            0
# 6     2   222       10        3            0
# 7     3   222       15        3            0
# 8     4   222       20        3            0
# 9     1   333        5        1            0
#10     2   333        0        0            0
#11     3   333        0        0            0






或使用 data.table

library(data.table)
i1 <- setDT(df1)[, grp := rleid(Discount > 0), SKU][Discount > 0,
  Duration := .N,  .(grp, SKU)][, 
   LastDiscount := uniqueN(grp[Discount > 0]), .(SKU)][, 
   tail(.I[Discount > 0 & LastDiscount > 1], 1), SKU]$V1
df1[-i1, LastDiscount := 0][]



数据



data

df1 <- structure(list(Week = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L), SKU = c(111L, 111L, 111L, 111L, 222L, 222L, 222L, 222L, 
333L, 333L, 333L), Discount = c(5L, 5L, 0L, 10L, 0L, 10L, 15L, 
20L, 5L, 0L, 0L)), class = "data.frame", row.names = c(NA, -11L
))

这篇关于用R数据框中的条件进行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆