使用滑动窗口对数据框中的计数进行求和 [英] Summing the counts in a data frame using sliding window
问题描述
df< - data.frame( ID = c(rep(A1,10),rep(A2,13),rep(A3,12)),
值= c(10,2,4,23,10, 5,20,15,13,21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20, 16,29,21,19,10))
对于每个ID,我想总结计数在每个3个位置的滑动窗口中的值列中。以下数据框是摘自 df
,仅包含对应于 A1
的记录:
ID值
A1 10
A1 2
A1 4
A1 23
A1 10
A1 5
A1 20
A1 15
A1 13
A1 21
我想在时间和总计3个条目,并移动到下3个条目。当滑动窗口不能容纳3个条目时,我跳过这些值。
例如, Window_1
从第一个值开始( 10
),而 window_2
从第二个值( 2
)开始,而window_3从第三个值开始( 4
)。
window_1 = [10 + 2 + 4] + [23 + 10 + 5] + [20 + 15 + 13] = 102
window_2 = [2 + 4 + 23] + [10 + 5 + 20] + [15 + 13 + 21] = 113
window_3 = [4 + 23 + 10] + [5 + 20 +15] = 77
并在数据框中报告如下:
ID Window_1 Window_2 Window_3
A1 102 113 77
同样,我希望将数据帧df中的每个列的列值
中的计数和并入一个data.frmae中,如下所示:
ID window_1 window_2 window_3
A1 102 113 77
A2 206 195 161
A3 198 163 175
我尝试了以下代码
sum_win_3 = 0
sum_win_2 = 0
sum_win_1 = 0
win_1_counts = 0
win_2_counts = 0
win_3_counts = 0
for(i in seq(1,length(df $ Values),3))
{
if((i + i + 1 + i + 2 )%% 3 == 0)
{
win_1_counts = df $ Values [i] + df $ Values [i + 1] + df $ Values [i + 2]
win_1_counts [is.na(win_1_counts)] = 0
#print(win_1_counts)
}
sum_win_1 = sum_win_1 + win_1_counts
}
#print(sum_win_1)
for(j in seq (2,length(df $ Values),3))
{
if((j + j + 1 + j + 2)%% 3 == 0)
{
win_2_counts = df $ Values [j] + df $ Values [j + 1] + df $ Values [j + 2]
win_2_counts [is.na(win_2_counts)] = 0
#print(win_2_counts )
}
sum_win_2 = sum_win_2 + win_2_counts
}
#print(sum_win_2)
for(k in seq(3,length(df $ Values ),3))
{
if((k + k + 1 + k + 2)%% 3 == 0)
{
win_3_counts = df $ Values [k ] + df $ Values [k + 1] + df $ Values [k + 2]
win_3_counts [is.na(win_3_counts)] = 0
#print(win_3_counts)
}
#sum_win_3 = sum_win_3 + win_3_counts
}
print(sum_win_3)
output = data.frame(ID = df [1],Window_1 = sum_win_1,Window_2 = sum_win_2,Window_3 = sum_win_3)
上面的鳕鱼e将window_1,windows_2和window_3的计数相加,将所有的ID都整合在一起,分别对每个ID进行操作。
请指导我以上述所需的格式获取输出。
提前感谢
使用 data.table 包,我会接近如下:
library(data.table)
setDT(df)[,。(w1 = sum [1:(3 *(.N%/%3))]),
w2 = sum(Values [2:(3 *((.N-1)%/%3)+1)]) ,
w3 = sum(Values [3:(3 *((。N-2)%/%3)+2)])),by = ID]
其中:
ID w1 w2 w3
1:A1 102 113 77
2:A2 206 195 161
3:A3 198 163 175
或者为了避免重复(@xath @Cath):
setDT(df) (1:3,function(i){sum(Values [i:(3 *((.N-i + 1)%/%3)+(i-1))])}),by =
其中:
ID V1 V2 V3
1:A1 102 113 77
2:A2 206 195 161
3:A3 198 163 175
如果要重命名V1,V2& V3变量,你可以这样做,但你也可以这样做:
cols < - c(w1, w2,w3)
setDT(df)[,(cols):= lapply(1:3,function(i){sum(Values [i:(3 *((.N-i + )%/%3)+(i-1))])}),by = ID]
I am new to R. I have a data frame in R like following
df <- data.frame(ID=c(rep("A1",10),rep("A2",13),rep("A3",12)),
Values=c(10,2,4,23,10,5,20,15,13,21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20,16,29,21,19,10))
For every ID I would like to sum the counts in column "Values" in a sliding windows for every 3 positions. Following data frame is an excerpt from df
which includes only the records corresponding to A1
:
ID Values
A1 10
A1 2
A1 4
A1 23
A1 10
A1 5
A1 20
A1 15
A1 13
A1 21
I would like to take 3 entries at time and sum and move to next 3 entries. When the sliding windows can't accommodate 3 entries then I skip those values.
For an example, Window_1
starts from first value (10
) while window_2
starts from second value (2
) and window_3 starts from third value (4
).
window_1 = [10+2+4] + [23+10+5] + [20+15+13] = 102
window_2 = [2+4+23] + [10+5+20] + [15+13+21] = 113
window_3 = [4+23+10] + [5+20+15] = 77
and report it in a data frame like following:
ID Window_1 Window_2 Window_3
A1 102 113 77
Likewise I would like sum the counts in column Values
for everyid in the data frame "df" and report in a data.frmae like following:
ID window_1 window_2 window_3
A1 102 113 77
A2 206 195 161
A3 198 163 175
I tried the following code
sum_win_3=0
sum_win_2=0
sum_win_1=0
win_1_counts=0
win_2_counts=0
win_3_counts=0
for (i in seq(1,length(df$Values),3))
{
if((i+i+1+i+2) %% 3 == 0)
{
win_1_counts=df$Values[i]+df$Values[i+1]+df$Values[i+2]
win_1_counts[is.na(win_1_counts)]=0
#print(win_1_counts)
}
sum_win_1=sum_win_1+win_1_counts
}
#print(sum_win_1)
for (j in seq(2,length(df$Values),3))
{
if((j+j+1+j+2) %% 3 == 0)
{
win_2_counts=df$Values[j]+df$Values[j+1]+df$Values[j+2]
win_2_counts[is.na(win_2_counts)]=0
#print(win_2_counts)
}
sum_win_2=sum_win_2+win_2_counts
}
#print(sum_win_2)
for (k in seq(3,length(df$Values),3))
{
if((k+k+1+k+2) %% 3 == 0)
{
win_3_counts=df$Values[k]+df$Values[k+1]+df$Values[k+2]
win_3_counts[is.na(win_3_counts)]=0
#print(win_3_counts)
}
#sum_win_3=sum_win_3+win_3_counts
}
print(sum_win_3)
output=data.frame(ID=df[1],Window_1=sum_win_1,Window_2=sum_win_2,Window_3=sum_win_3)
The above code sums the counts for window_1, windows_2 and window_3 by taking all the IDs together rather working on every ID separately.
Kindly guide me in getting the the output in the desired format stated above.
Thanks in advance
Using the data.table package, I would approach it as follows:
library(data.table)
setDT(df)[, .(w1 = sum(Values[1:(3*(.N%/%3))]),
w2 = sum(Values[2:(3*((.N-1)%/%3)+1)]),
w3 = sum(Values[3:(3*((.N-2)%/%3)+2)])), by = ID]
which gives:
ID w1 w2 w3
1: A1 102 113 77
2: A2 206 195 161
3: A3 198 163 175
Or to avoid the repetition (thanx to @Cath):
setDT(df)[, lapply(1:3, function(i) {sum(Values[i:(3*((.N-i+1)%/%3)+(i-1))])}), by = ID]
which gives:
ID V1 V2 V3
1: A1 102 113 77
2: A2 206 195 161
3: A3 198 163 175
If you want to rename the V1, V2 & V3 variables, you can do that afterwards, but you can also do:
cols <- c("w1","w2","w3")
setDT(df)[, (cols) := lapply(1:3, function(i) {sum(Values[i:(3*((.N-i+1)%/%3)+(i-1))])}), by = ID]
这篇关于使用滑动窗口对数据框中的计数进行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!