以值为条件的列上的总和 [英] Running sum on a column conditional on value

查看:67
本文介绍了以值为条件的列上的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二进制变量向量,该向量指示该时期内某产品是否在促销中.我正在尝试找出如何计算每次促销的持续时间以及两次促销之间的持续时间.

I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions.

promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0))

所以换句话说:如果promo.flag与上一个周期相同,则running.total + 1,否则running.total重置为1

So in other words: if promo.flag is same as previous period then running.total + 1, else running.total is reset to 1

我尝试过使用apply函数和cumsum,但无法获得运行总工作量的条件重置:-(

I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-(

我需要的输出是:

promo.flag =  c(1,1,0,1,0,0,1,1,1,0,1,1,0)
rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)

有人可以阐明如何在R中实现这一目标吗?

Can anybody shed any light on how to achieve this in R?

推荐答案

听起来您需要运行长度编码(通过基本R中的rle命令).

It sounds like you need run length encoding (via the rle command in base R).

unlist(sapply(rle(promo.flag)$lengths,seq))

为您提供向量1 2 1 1 1 2 1 2 3 1 1 2 1.不确定最后要用0表示什么,但是我认为这是一个终止条件,事后很容易更改.

Gives you a vector 1 2 1 1 1 2 1 2 3 1 1 2 1. Not sure what you're going for with the zero at the end, but I assume it's a terminal condition and easy to change after the fact.

之所以有用,是因为rle()返回一个包含两个的列表,其中一个名为lengths,并且包含一个紧凑的序列,该序列重复了每次.然后seq当输入单个整数时,将为您提供从1到该数字的序列.然后使用rle()$lengths中的单个数字重复应用seq调用,生成迷你序列列表. unlist然后将该列表转换为向量.

This works because rle() returns a list of two, one of which is named lengths and contains a compact sequence of how many times each is repeated. Then seq when fed a single integer gives you a sequence from 1 to that number. Then apply repeatedly calls seq with the single numbers in rle()$lengths, generating a list of the mini sequences. unlist then turns that list into a vector.

这篇关于以值为条件的列上的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆