在以值为条件的列上运行总和 [英] Running sum on a column conditional on value
问题描述
我有一个二元变量向量,用于说明产品在此期间是否正在促销.我正在研究如何计算每次促销的持续时间和促销之间的持续时间.
I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions.
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0))
所以换句话说:如果 promo.flag
与上一期相同,则 running.total + 1
,否则 running.total
是重置为 1
So in other words: if promo.flag
is same as previous period then running.total + 1
, else running.total
is reset to 1
我尝试使用应用函数和 cumsum 但无法获得运行总工作的条件重置:-(
I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-(
我需要的输出是:
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)
rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)
谁能解释一下如何在 R 中实现这一点?
Can anybody shed any light on how to achieve this in R?
推荐答案
听起来您需要运行长度编码(通过 base R 中的 rle
命令).
It sounds like you need run length encoding (via the rle
command in base R).
unlist(sapply(rle(promo.flag)$lengths,seq))
给你一个向量1 2 1 1 1 2 1 2 3 1 1 2 1
.不确定最后的零是什么意思,但我认为这是一个终止条件,事后很容易改变.
Gives you a vector 1 2 1 1 1 2 1 2 3 1 1 2 1
. Not sure what you're going for with the zero at the end, but I assume it's a terminal condition and easy to change after the fact.
这是可行的,因为 rle()
返回一个包含两个的列表,其中一个名为 lengths
并包含每个重复多少次的紧凑序列.然后 seq
当输入一个整数时会给你一个从 1 到那个数字的序列.然后使用 rle()$lengths
中的单个数字重复调用 seq
,生成迷你序列列表.unlist
然后将该列表转换为向量.
This works because rle()
returns a list of two, one of which is named lengths
and contains a compact sequence of how many times each is repeated. Then seq
when fed a single integer gives you a sequence from 1 to that number. Then apply repeatedly calls seq
with the single numbers in rle()$lengths
, generating a list of the mini sequences. unlist
then turns that list into a vector.
这篇关于在以值为条件的列上运行总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!