运行长度的累积总和.这个循环可以向量化吗? [英] Cumulative sums over run lengths. Can this loop be vectorized?

查看:34
本文介绍了运行长度的累积总和.这个循环可以向量化吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我在该数据框上计算特定列的运行长度编码.dir 列的值为 -1、0 或 1.

I have a data frame on which I calculate a run length encoding for a specific column. The values of the column, dir, are either -1, 0, or 1.

dir.rle <- rle(df$dir)

然后,我获取运行长度并计算数据框中另一列的分段累积总和.我正在使用 for 循环,但我觉得应该有一种更智能的方法.

I then take the run lengths and compute segmented cumulative sums across another column in the data frame. I'm using a for loop, but I feel like there should be a way to do this more intelligently.

ndx <- 1
for(i in 1:length(dir.rle$lengths)) {
    l <- dir.rle$lengths[i] - 1
    s <- ndx
    e <- ndx+l
    tmp[s:e,]$cumval <- cumsum(df[s:e,]$val)
    ndx <- e + 1
}

dir 的运行长度定义了每次运行的开始 s 和结束 e.上面的代码有效,但感觉不像是惯用的 R 代码.我觉得好像应该有另一种没有循环的方法.

The run lengths of dir define the start, s, and end, e, for each run. The above code works but it does not feel like idiomatic R code. I feel as if there should be another way to do it without the loop.

推荐答案

这可以分解为两步问题.首先,如果我们基于 rle 创建一个索引列,那么我们可以使用它来分组并运行 cumsum.然后可以通过任意数量的聚合技术来执行分组.我将展示两个选项,一个使用 data.table,另一个使用 plyr.

This can be broken down into a two step problem. First, if we create an indexing column based off of the rle, then we can use that to group by and run the cumsum. The group by can then be performed by any number of aggregation techniques. I'll show two options, one using data.table and the other using plyr.

library(data.table)
library(plyr)
#data.table is the same thing as a data.frame for most purposes
#Fake data
dat <- data.table(dir = sample(-1:1, 20, TRUE), value = rnorm(20))
dir.rle <- rle(dat$dir)
#Compute an indexing column to group by
dat <- transform(dat, indexer = rep(1:length(dir.rle$lengths), dir.rle$lengths))


#What does the indexer column look like?
> head(dat)
     dir      value indexer
[1,]   1  0.5045807       1
[2,]   0  0.2660617       2
[3,]   1  1.0369641       3
[4,]   1 -0.4514342       3
[5,]  -1 -0.3968631       4
[6,]  -1 -2.1517093       4


#data.table approach
dat[, cumsum(value), by = indexer]

#plyr approach
ddply(dat, "indexer", summarize, V1 = cumsum(value))

这篇关于运行长度的累积总和.这个循环可以向量化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆