条件R的计数/重复 [英] Counting/repeating with conditions R
问题描述
从我之前问过的问题开始(以R数据帧中的条件计数),我有下表:
From a question I asked before (Count with conditions in R dataframe), I have the following table:
Week SKU Discount(%) Duration LastDiscount
1 111 5 2 0
2 111 5 2 0
3 111 0 0 0
4 111 10 2 0
5 111 11 2 2
1 222 0 0 0
2 222 10 3 0
3 222 15 3 0
4 222 20 3 0
我希望 LastDiscount
计数在第一行中,同一周的同一SKU在不同星期有不同的折扣。例如,SKU 111在第2周有折扣,下一个折扣在第4周,从上次折扣起算有2周,但是问题是我希望结果在第4周开始下一个折扣折扣活动。
I want the LastDiscount
count to be in the first row where there is a different discount for the same SKU in different weeks. For example, the SKU 111 had a discount in the 2nd week and the next discount is in the 4th week, that gives 2 weeks since the last discount but the problem is that I want the result to be in the 4th week where starts the next discount campaign.
类似这样的东西:
Week SKU Discount(%) Duration LastDiscount
1 111 5 2 0
2 111 5 2 0
3 111 0 0 0
4 111 10 2 2
5 111 11 2 0
1 222 0 0 0
2 222 10 3 0
3 222 15 3 0
4 222 20 3 0
我现在有此代码:
df1 %>%
group_by(SKU) %>%
mutate(Duration = with(rle(Discount > 0), rep(lengths*values,
lengths)),
temp = with(rle(Discount > 0), sum(values != 0)),
LastDiscount = if(temp[1] > 1) c(rep(0, n()-1), temp[1]) else 0) %>%
select(-temp)
推荐答案
以下是使用的选项data.table
。如果OP仅在寻找 dplyr
解决方案,我将删除它:
Here is an option using data.table
. I will delete it if OP is only looking for a dplyr
solution:
#calculate duration of discount and also the start and end of discount period
DT[, c("Duration", "disc_seq") := {
dur <- sum(`Discount(%)` > 0L)
disc_seq <- rep("", .N)
if (dur > 0) {
disc_seq[1L] <- "S"
disc_seq[length(disc_seq)] <- "E"
}
.(dur, disc_seq)
},
.(SKU, rleid(`Discount(%)` > 0L))]
DT[]
#use a non-equi join to find the end of previous discount period to update LastDiscount column of the start of current discount period
DT[, LastDiscount := 0L]
DT[disc_seq=="S", LastDiscount := {
ld <- DT[disc_seq=="E"][.SD, on=.(SKU, Week<Week), by=.EACHI, i.Week - x.Week]$V1
replace(ld, is.na(ld), 0L)
}]
DT[]
输出:
Week SKU Discount(%) Duration disc_seq LastDiscount
1: 1 111 5 2 S 0
2: 2 111 5 2 E 0
3: 3 111 0 0 0
4: 4 111 10 2 S 2
5: 5 111 11 2 E 0
6: 1 222 0 0 0
7: 2 222 10 3 S 0
8: 3 222 15 3 0
9: 4 222 20 3 E 0
数据:
library(data.table)
DT <- fread("Week SKU Discount(%)
1 111 5
2 111 5
3 111 0
4 111 10
5 111 11
1 222 0
2 222 10
3 222 15
4 222 20")
这篇关于条件R的计数/重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!