以值为条件的列上的总和 [英] Running sum on a column conditional on value
问题描述
我有一个二进制变量向量,该向量指示该时期内某产品是否在促销中.我正在尝试找出如何计算每次促销的持续时间以及两次促销之间的持续时间.
I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions.
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0))
所以换句话说:如果promo.flag
与上一个周期相同,则running.total + 1
,否则running.total
重置为1
So in other words: if promo.flag
is same as previous period then running.total + 1
, else running.total
is reset to 1
我尝试过使用apply函数和cumsum,但无法获得运行总工作量的条件重置:-(
I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-(
我需要的输出是:
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)
rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)
有人可以阐明如何在R中实现这一目标吗?
Can anybody shed any light on how to achieve this in R?
推荐答案
听起来您需要运行长度编码(通过基本R中的rle
命令).
It sounds like you need run length encoding (via the rle
command in base R).
unlist(sapply(rle(promo.flag)$lengths,seq))
为您提供向量1 2 1 1 1 2 1 2 3 1 1 2 1
.不确定最后要用0表示什么,但是我认为这是一个终止条件,事后很容易更改.
Gives you a vector 1 2 1 1 1 2 1 2 3 1 1 2 1
. Not sure what you're going for with the zero at the end, but I assume it's a terminal condition and easy to change after the fact.
之所以有用,是因为rle()
返回一个包含两个的列表,其中一个名为lengths
,并且包含一个紧凑的序列,该序列重复了每次.然后seq
当输入单个整数时,将为您提供从1到该数字的序列.然后使用rle()$lengths
中的单个数字重复应用seq
调用,生成迷你序列列表. unlist
然后将该列表转换为向量.
This works because rle()
returns a list of two, one of which is named lengths
and contains a compact sequence of how many times each is repeated. Then seq
when fed a single integer gives you a sequence from 1 to that number. Then apply repeatedly calls seq
with the single numbers in rle()$lengths
, generating a list of the mini sequences. unlist
then turns that list into a vector.
这篇关于以值为条件的列上的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!