运行长度的累积总和.这个循环可以向量化吗? [英] Cumulative sums over run lengths. Can this loop be vectorized?
问题描述
我有一个数据框,我在该数据框上计算特定列的运行长度编码.dir
列的值为 -1、0 或 1.
I have a data frame on which I calculate a run length encoding for a specific column. The values of the column, dir
, are either -1, 0, or 1.
dir.rle <- rle(df$dir)
然后,我获取运行长度并计算数据框中另一列的分段累积总和.我正在使用 for 循环,但我觉得应该有一种更智能的方法.
I then take the run lengths and compute segmented cumulative sums across another column in the data frame. I'm using a for loop, but I feel like there should be a way to do this more intelligently.
ndx <- 1
for(i in 1:length(dir.rle$lengths)) {
l <- dir.rle$lengths[i] - 1
s <- ndx
e <- ndx+l
tmp[s:e,]$cumval <- cumsum(df[s:e,]$val)
ndx <- e + 1
}
dir
的运行长度定义了每次运行的开始 s
和结束 e
.上面的代码有效,但感觉不像是惯用的 R 代码.我觉得好像应该有另一种没有循环的方法.
The run lengths of dir
define the start, s
, and end, e
, for each run. The above code works but it does not feel like idiomatic R code. I feel as if there should be another way to do it without the loop.
推荐答案
这可以分解为两步问题.首先,如果我们基于 rle
创建一个索引列,那么我们可以使用它来分组并运行 cumsum
.然后可以通过任意数量的聚合技术来执行分组.我将展示两个选项,一个使用 data.table
,另一个使用 plyr
.
This can be broken down into a two step problem. First, if we create an indexing column based off of the rle
, then we can use that to group by and run the cumsum
. The group by can then be performed by any number of aggregation techniques. I'll show two options, one using data.table
and the other using plyr
.
library(data.table)
library(plyr)
#data.table is the same thing as a data.frame for most purposes
#Fake data
dat <- data.table(dir = sample(-1:1, 20, TRUE), value = rnorm(20))
dir.rle <- rle(dat$dir)
#Compute an indexing column to group by
dat <- transform(dat, indexer = rep(1:length(dir.rle$lengths), dir.rle$lengths))
#What does the indexer column look like?
> head(dat)
dir value indexer
[1,] 1 0.5045807 1
[2,] 0 0.2660617 2
[3,] 1 1.0369641 3
[4,] 1 -0.4514342 3
[5,] -1 -0.3968631 4
[6,] -1 -2.1517093 4
#data.table approach
dat[, cumsum(value), by = indexer]
#plyr approach
ddply(dat, "indexer", summarize, V1 = cumsum(value))
这篇关于运行长度的累积总和.这个循环可以向量化吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!