通过并行处理在R中矩阵的每一列上滑动窗口 [英] Sliding window on each column of a matrix in R with parallel processing

查看:94
本文介绍了通过并行处理在R中矩阵的每一列上滑动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有2000列和3000行的大型矩阵.对于每一列,我想做一个滑动窗口,在该窗口中,我将15行加在一起,然后向下一行再将下15行加起来,依此类推...并使用此信息创建一个新的矩阵.我有一个可以正常工作的函数(虽然看起来有点慢),但想并行运行,因为这是一个较大的脚本的一部分,如果我使用的应用函数没有并行等效项,则打开的集群会关闭.而且,我必须执行整个操作100次.在脚本的后面,我使用parLapply或parSapply.

I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100 times. Later in the script, I use the parLapply or parSapply.

这是我以非并行方式获得的代码:

Here is the code I have in a non-parallel fashion:

# df is a matrix with 2000 columns and 3000 rows, all numeric and no NAs

size <- 15 # size of window
len <- nrow(df) - size + 1 # number of sliding windows to perform 

sumsmatrix <- apply(df, 2, function(x){
      result <- sapply(1:len, function(y){
      sum(x[y:(y+size-1)])
       })
      return(result)
      })

先谢谢了. 罗恩

推荐答案

尝试使用cumsum,您将不必再次对相同的数字求和.

Try using cumsum, you won't have to sum the same numbers over again.

sumsmatrix <- apply(df, 2, function(x)                   
                     cumsum(x)[size:nrow(df)] - c(0,cumsum(x)[1:(len-1)]))

它应该比您正在执行的速度快约100倍.

It should be about 100 times faster than what you were doing.

这是它的工作方式:

为了方便起见,让我们说您的x长只有5,窗口大小是3.

Let's just say that your x is only 5 long, and your window size is 3, to make it easier.

x <- 1:5
x
# [1] 1 2 3 4 5
cumsum(x)
# [1]  1  3  6 10 15

因此,cumsum(x)的第三个数字是您想要的第一个总和,但是第四个和第五个数字太大,因为它们包括前几个数字作为窗口的一部分.因此,您只需将两者相减即可.

So, the third number of cumsum(x) is what you want for the first sum, but the fourth and fifth numbers are too big, because they inlcude the first few numbers as part of the window. So, you just subtract the two.

cumsum(x)[3:5]    
# [1] 6 10 15    
cumsum(x)[1:2]
# [1]    1  3

但是,对于第一个,您需要减去零.

But, for the first one you need to subtract zero.

cumsum(x)[3:5]    
# [1] 6 10 15    
c(0,cumsum(x)[1:2])
# [1] 0  1  3

这篇关于通过并行处理在R中矩阵的每一列上滑动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆