使用滑动窗口对数据框中的计数进行求和 [英] Summing the counts in a data frame using sliding window

查看:415
本文介绍了使用滑动窗口对数据框中的计数进行求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  df<  -  data.frame( ID = c(rep(A1,10),rep(A2,13),rep(A3,12)),
值= c(10,2,4,23,10, 5,20,15,13,​​21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20, 16,29,21,19,10))

对于每个ID,我想总结计数在每个3个位置的滑动窗口中的值列中。以下数据框是摘自 df ,仅包含对应于 A1 的记录:

  ID值
A1 10
A1 2
A1 4
A1 23
A1 10
A1 5
A1 20
A1 15
A1 13
A1 21

我想在时间和总计3个条目,并移动到下3个条目。当滑动窗口不能容纳3个条目时,我跳过这些值。



例如, Window_1 从第一个值开始( 10 ),而 window_2 从第二个值( 2 )开始,而window_3从第三个值开始( 4 )。

  window_1 = [10 + 2 + 4] + [23 + 10 + 5] + [20 + 15 + 13] = 102 
window_2 = [2 + 4 + 23] + [10 + 5 + 20] + [15 + 13 + 21] = 113
window_3 = [4 + 23 + 10] + [5 + 20 +15] = 77

并在数据框中报告如下:

  ID Window_1 Window_2 Window_3 
A1 102 113 77

同样,我希望将数据帧df中的每个列的列中的计数和并入一个data.frmae中,如下所示:

  ID window_1 window_2 window_3 
A1 102 113 77
A2 206 195 161
A3 198 163 175

我尝试了以下代码

  sum_win_3 = 0 
sum_win_2 = 0
sum_win_1 = 0
win_1_counts = 0
win_2_counts = 0
win_3_counts = 0

for(i in seq(1,length(df $ Values),3))
{

if((i + i + 1 + i + 2 )%% 3 == 0)
{
win_1_counts = df $ Values [i] + df $ Values [i + 1] + df $ Values [i + 2]
win_1_counts [is.na(win_1_counts)] = 0
#print(win_1_counts)
}
sum_win_1 = sum_win_1 + win_1_counts
}
#print(sum_win_1)

for(j in seq (2,length(df $ Values),3))
{
if((j + j + 1 + j + 2)%% 3 == 0)
{
win_2_counts = df $ Values [j] + df $ Values [j + 1] + df $ Values [j + 2]
win_2_counts [is.na(win_2_counts)] = 0
#print(win_2_counts )
}
sum_win_2 = sum_win_2 + win_2_counts
}
#print(sum_win_2)

for(k in seq(3,length(df $ Values ),3))
{
if((k + k + 1 + k + 2)%% 3 == 0)
{
win_3_counts = df $ Values [k ] + df $ Values [k + 1] + df $ Values [k + 2]
win_3_counts [is.na(win_3_counts)] = 0
#print(win_3_counts)
}
#sum_win_3 = sum_win_3 + win_3_counts
}
print(sum_win_3)
output = data.frame(ID = df [1],Window_1 = sum_win_1,Window_2 = sum_win_2,Window_3 = sum_win_3)

上面的鳕鱼e将window_1,windows_2和window_3的计数相加,将所有的ID都整合在一起,分别对每个ID进行操作。

请指导我以上述所需的格式获取输出。
提前感谢

解决方案

使用 data.table 包,我会接近如下:

  library(data.table)
setDT(df)[,。(w1 = sum [1:(3 *(.N%/%3))]),
w2 = sum(Values [2:(3 *((.N-1)%/%3)+1)]) ,
w3 = sum(Values [3:(3 *((。N-2)%/%3)+2)])),by = ID]

其中:

  ID w1 w2 w3 
1:A1 102 113 77
2:A2 206 195 161
3:A3 198 163 175

或者为了避免重复(@xath @Cath):

  setDT(df) (1:3,function(i){sum(Values [i:(3 *((.N-i + 1)%/%3)+(i-1))])}),by = 

其中:

  ID V1 V2 V3 
1:A1 102 113 77
2:A2 206 195 161
3:A3 198 163 175

如果要重命名V1,V2& V3变量,你可以这样做,但你也可以这样做:

  cols < -  c(w1, w2,w3)
setDT(df)[,(cols):= lapply(1:3,function(i){sum(Values [i:(3 *((.N-i + )%/%3)+(i-1))])}),by = ID]


I am new to R. I have a data frame in R like following

df <- data.frame(ID=c(rep("A1",10),rep("A2",13),rep("A3",12)),
                 Values=c(10,2,4,23,10,5,20,15,13,21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20,16,29,21,19,10))

For every ID I would like to sum the counts in column "Values" in a sliding windows for every 3 positions. Following data frame is an excerpt from df which includes only the records corresponding to A1:

ID    Values
A1     10
A1      2
A1      4
A1     23
A1     10
A1      5
A1     20
A1     15
A1     13
A1     21

I would like to take 3 entries at time and sum and move to next 3 entries. When the sliding windows can't accommodate 3 entries then I skip those values.

For an example, Window_1 starts from first value (10) while window_2 starts from second value (2) and window_3 starts from third value (4).

 window_1 = [10+2+4] + [23+10+5] + [20+15+13] = 102 
 window_2 = [2+4+23] + [10+5+20] + [15+13+21] = 113
 window_3 = [4+23+10] + [5+20+15] = 77

and report it in a data frame like following:

ID  Window_1 Window_2 Window_3
A1   102       113      77

Likewise I would like sum the counts in column Values for everyid in the data frame "df" and report in a data.frmae like following:

ID    window_1   window_2   window_3
A1      102       113         77
A2      206       195         161
A3      198       163         175

I tried the following code

sum_win_3=0
sum_win_2=0
sum_win_1=0
win_1_counts=0
win_2_counts=0
win_3_counts=0

for (i in seq(1,length(df$Values),3))
{

  if((i+i+1+i+2) %% 3 == 0)
  {
    win_1_counts=df$Values[i]+df$Values[i+1]+df$Values[i+2]
    win_1_counts[is.na(win_1_counts)]=0
    #print(win_1_counts)
  }
  sum_win_1=sum_win_1+win_1_counts
}
#print(sum_win_1)

for (j in seq(2,length(df$Values),3))
{
  if((j+j+1+j+2) %% 3 == 0)
  {
    win_2_counts=df$Values[j]+df$Values[j+1]+df$Values[j+2]
    win_2_counts[is.na(win_2_counts)]=0
    #print(win_2_counts)
  }
  sum_win_2=sum_win_2+win_2_counts
}
#print(sum_win_2)

for (k in seq(3,length(df$Values),3))
{
  if((k+k+1+k+2) %% 3 == 0)
  {
    win_3_counts=df$Values[k]+df$Values[k+1]+df$Values[k+2]
    win_3_counts[is.na(win_3_counts)]=0
    #print(win_3_counts)
  }
  #sum_win_3=sum_win_3+win_3_counts
}
print(sum_win_3)
output=data.frame(ID=df[1],Window_1=sum_win_1,Window_2=sum_win_2,Window_3=sum_win_3)

The above code sums the counts for window_1, windows_2 and window_3 by taking all the IDs together rather working on every ID separately.
Kindly guide me in getting the the output in the desired format stated above. Thanks in advance

解决方案

Using the data.table package, I would approach it as follows:

library(data.table)
setDT(df)[, .(w1 = sum(Values[1:(3*(.N%/%3))]),
              w2 = sum(Values[2:(3*((.N-1)%/%3)+1)]),
              w3 = sum(Values[3:(3*((.N-2)%/%3)+2)])), by = ID]

which gives:

   ID  w1  w2  w3
1: A1 102 113  77
2: A2 206 195 161
3: A3 198 163 175

Or to avoid the repetition (thanx to @Cath):

setDT(df)[, lapply(1:3, function(i) {sum(Values[i:(3*((.N-i+1)%/%3)+(i-1))])}), by = ID]

which gives:

   ID  V1  V2  V3
1: A1 102 113  77
2: A2 206 195 161
3: A3 198 163 175

If you want to rename the V1, V2 & V3 variables, you can do that afterwards, but you can also do:

cols <- c("w1","w2","w3")
setDT(df)[, (cols) := lapply(1:3, function(i) {sum(Values[i:(3*((.N-i+1)%/%3)+(i-1))])}), by = ID]

这篇关于使用滑动窗口对数据框中的计数进行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆