R中矩阵每一列的滑动窗口,并行处理 [英] Sliding window on each column of a matrix in R with parallel processing

查看:44
本文介绍了R中矩阵每一列的滑动窗口,并行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 2000 列和 3000 行的大矩阵.对于每一列,我想做一个滑动窗口,将 15 行相加,然后向下一行,对接下来的 15 行求和,等等……并使用此信息创建一个新矩阵.我有一个可以工作的函数(虽然看起来有点慢)但想并行运行它,因为这是更大脚本的一部分,如果我使用没有并行等效项的应用函数,开放式集群将关闭.此外,我必须做这整个操作 100 次.在脚本的后面,我使用了 parLapply 或 parSapply.

I have a large matrix with 2000 columns and 3000 rows. For each column, I want to do a sliding window where I sum 15 rows together, then go down one row and sum the next 15, etc... and create a new matrix with this information. I have a function that works (although seems a bit slow) but would like to run it in parallel, as this is part of a larger script and if I use apply functions without the parallel equivalent the open cluster shuts down. Moreover, I have to do this whole operation 100 times. Later in the script, I use the parLapply or parSapply.

这是我以非并行方式编写的代码:

Here is the code I have in a non-parallel fashion:

# df is a matrix with 2000 columns and 3000 rows, all numeric and no NAs

size <- 15 # size of window
len <- nrow(df) - size + 1 # number of sliding windows to perform 

sumsmatrix <- apply(df, 2, function(x){
      result <- sapply(1:len, function(y){
      sum(x[y:(y+size-1)])
       })
      return(result)
      })

提前致谢.罗恩

推荐答案

尝试使用 cumsum,您将不必再次对相同的数字求和.

Try using cumsum, you won't have to sum the same numbers over again.

sumsmatrix <- apply(df, 2, function(x)                   
                     cumsum(x)[size:nrow(df)] - c(0,cumsum(x)[1:(len-1)]))

它应该比你正在做的快大约 100 倍.

It should be about 100 times faster than what you were doing.

这是它的工作原理:

假设您的 x 只有 5 长,而您的窗口大小是 3,以使其更容易.

Let's just say that your x is only 5 long, and your window size is 3, to make it easier.

x <- 1:5
x
# [1] 1 2 3 4 5
cumsum(x)
# [1]  1  3  6 10 15

所以,cumsum(x) 的第三个数字就是你想要的第一个和,但是第四个和第五个数字太大了,因为它们包含了前几个数字作为窗户.因此,您只需将两者相减即可.

So, the third number of cumsum(x) is what you want for the first sum, but the fourth and fifth numbers are too big, because they inlcude the first few numbers as part of the window. So, you just subtract the two.

cumsum(x)[3:5]    
# [1] 6 10 15    
cumsum(x)[1:2]
# [1]    1  3

但是,对于第一个,您需要减去零.

But, for the first one you need to subtract zero.

cumsum(x)[3:5]    
# [1] 6 10 15    
c(0,cumsum(x)[1:2])
# [1] 0  1  3

这篇关于R中矩阵每一列的滑动窗口,并行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆