运行长度的累积和。这个循环可以向量化吗? [英] Cumulative sums over run lengths. Can this loop be vectorized?

查看:125
本文介绍了运行长度的累积和。这个循环可以向量化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我计算一个特定列的运行长度编码。列 dir 的值为-1,0或1。

I have a data frame on which I calculate a run length encoding for a specific column. The values of the column, dir, are either -1, 0, or 1.

dir.rle <-rle(df $ dir)

然后,我将运行长度并计算另一列的分段累积和在数据帧中。我使用一个for循环,但我觉得应该有一个方法来做这更聪明。

I then take the run lengths and compute segmented cumulative sums across another column in the data frame. I'm using a for loop, but I feel like there should be a way to do this more intelligently.

ndx <- 1
for(i in 1:length(dir.rle$lengths)) {
    l <- dir.rle$lengths[i] - 1
    s <- ndx
    e <- ndx+l
    tmp[s:e,]$cumval <- cumsum(df[s:e,]$val)
    ndx <- e + 1
}

dir 的运行时长定义开始, s ,结束, e 。上面的代码工作,但它不觉得像惯用的R代码。我觉得应该有另一种方法来做没有循环。

The run lengths of dir define the start, s, and end, e, for each run. The above code works but it does not feel like idiomatic R code. I feel as if there should be another way to do it without the loop.

推荐答案

这可以分为两步问题。首先,如果我们基于 rle 创建一个索引列,那么我们可以使用它来分组并运行 cumsum 。然后可以通过任何数量的聚合技术来执行组。我将显示两个选项,一个使用 data.table ,另一个使用 plyr

This can be broken down into a two step problem. First, if we create an indexing column based off of the rle, then we can use that to group by and run the cumsum. The group by can then be performed by any number of aggregation techniques. I'll show two options, one using data.table and the other using plyr.

library(data.table)
library(plyr)
#data.table is the same thing as a data.frame for most purposes
#Fake data
dat <- data.table(dir = sample(-1:1, 20, TRUE), value = rnorm(20))
dir.rle <- rle(dat$dir)
#Compute an indexing column to group by
dat <- transform(dat, indexer = rep(1:length(dir.rle$lengths), dir.rle$lengths))


#What does the indexer column look like?
> head(dat)
     dir      value indexer
[1,]   1  0.5045807       1
[2,]   0  0.2660617       2
[3,]   1  1.0369641       3
[4,]   1 -0.4514342       3
[5,]  -1 -0.3968631       4
[6,]  -1 -2.1517093       4


#data.table approach
dat[, cumsum(value), by = indexer]

#plyr approach
ddply(dat, "indexer", summarize, V1 = cumsum(value))

这篇关于运行长度的累积和。这个循环可以向量化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆