移动窗口方法以聚合数据 [英] Moving window method to aggregate data

查看:151
本文介绍了移动窗口方法以聚合数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的矩阵:

 mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
       2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
       0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
       0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
       0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
       1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
 dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1", "2", "3", "4", "5", "6"))

我需要使用移动窗口方法聚合列.首先,窗口大小将为2,以使窗口由2列组成.此汇总采用行总和.窗口将移动一级,然后再次获取行总和.对于所提供的示例数据帧,要聚合的第一列是列1和2,第二窗口将合并列2和3,然后合并3和4,然后合并4和5以及5和6.

I need to aggregate columns using a moving window method. First, the window size will be 2, such that the window is comprised of 2 columns. Row sums are taken for this aggregate. The window will shift by one step and again take row sums. For the example data frame provided, the first columns to be aggregated are columns 1&2, the second window will combine column 2&3, then 3&4, then 4&5 and 5&6.

这些结果(每个聚合的行总和)被放入一个矩阵中.在此矩阵中,行被保留,列现在代表每个聚合的结果.

These results (row sums for each aggregate) are put into a matrix. In this matrix rows are conserved and columns now represent the results for each aggregate.

接下来,移动窗口的大小将增加为3.这样一来,将3列数据进行合并(求和).同样,窗口会移动1步.对于所提供的示例数据帧,要聚合的第一列是第1-2-3列,第二个窗口将合并2-3-4列,然后是3-4-5、4-5-6列.结果放入单独的矩阵中.

Next, the moving window size will increase to a size of 3. Such that 3 columns of data are combined (summed). Similarly, the window shifts 1 step. For the example data frame provided, the first columns to be aggregated are columns 1-2-3, the second window will combine columns 2-3-4, then 3-4-5, 4-5-6. Results are put into a separate matrix.

移动窗口的大小将继续增加,直到该窗口成为所有列的大小为止.在此示例中,最大的窗口合并了所有6个图.

The size of the moving window will continue to increase until the window is the size of all columns. In this example, the largest window combines all 6 plots.

下面是窗口尺寸2和3的结果矩阵,给出了mat上方的示例矩阵.列是根据添加的列来命名的.

Below are result matrices for window sizes 2 and 3 given the example matrix above mat. Columns are named according to the columns that were added.

#Window length =2 
mat1<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
         2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
         0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
         1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2", "2_3", "3_4", "4_5", "5_6"))

 #Window length 3
 mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
         2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
         1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
 dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))

在我的示例中,我有6列,因此总共有5个结果矩阵.如果我有600列数据,我认为循环是迭代大型数据集的最有效方法.

In my example I have 6 columns, so there would be 5 result matrices total. In the event I had 600 columns of data, I am thinking a loop is the most efficient way to iterate over a large dataset.

推荐答案

这是基于R的一种方法

lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, 
   lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))


#[[1]]
#  [,1] [,2] [,3] [,4] [,5]
#a    3    2    0    0    1
#c    0    0    1    1    1
#f    0    1    1    0    0
#h    0    1    1    0    0
#i    1    2    1    1    1
#j    0    0    1    1    0
#l    1    0    0    0    0
#m    0    0    0    1    1
#p    0    0    0    0    1
#q    0    0    0    1    1
#s    0    0    0    1    2
#t    0    0    0    0    2
#u    0    0    0    0    1
#v    0    0    0    1    1
#x    3    1    0    0    0
#z    0    0    0    1    1

#[[2]]
#  [,1] [,2] [,3] [,4]
#a    3    2    0    1
#c    0    1    1    2
#f    1    1    1    0
#h    1    1    1    0
#i    2    2    2    1
#j    0    1    1    1
#l    1    0    0    0
#m    0    0    1    1
#p    0    0    0    1
#q    0    0    1    1
#s    0    0    1    2
#t    0    0    0    2
#u    0    0    0    1
#v    0    0    1    1
#x    3    1    0    0
#z    0    0    1    1
#....


由于这是滚动操作,因此我们也可以使用zoo中的rollapply,并且窗口宽度可变


As this is a rolling operation, we can also use rollapply from zoo with a variable window-width

lapply(2:ncol(mat), function(j)
    t(zoo::rollapply(seq_len(ncol(mat)), j, function(x) rowSums(mat[,x]))))

这篇关于移动窗口方法以聚合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆