基于组中的条件汇总行 [英] Summing rows based on conditional in groups
问题描述
之前,我询问与该问题有关但是我需要更优雅,更通用的方法来解决这个问题。
我将数据分为几组,我想根据条件对范围内的某些行求和。我更喜欢使用'dplyr'来执行此操作,因为它使我更容易理解。
Previously I asked related to this question but I need more elegant and general way to solve this. I have data separated in groups and I want to sum some rows in range based on conditional. I prefer to use 'dplyr' to do this because it's more straight forward for me to understand.
我需要的条件如下:
1:对于组1;
找到第一个出现的'10',并将该出现后的行加到组的末尾并计算行数。
1: for group 1 ; find the first occurrence of '10' and sum the rows after this occurrence to the end of the group and count how many rows.
2:对于组2;'找到最后一次出现的'10',然后将出现该行的之前加到该组的开头,并计算多少行!
2: for group 2;'find the last occurrence of '10' and and sum the rows before this occurrence to the beginning of the group and count how many rows!
3:对于第3组;找到 10的第一个匹配项,并将该匹配项的之前行加到该组的开始行,并计算出多少行。
3: for group 3; find the first occurrence of '10' and and sum the rows before this occurrence to the starting row of the group and count how many rows.
df <- data.frame(gr=rep(c(1,2,3),c(7,9,11)),
y_value=c(c(0,0,10,8,8,6,0),c(10,10,10,8,7,6,2,0,0), c(8,5,8,7,6,2,10,10,8,7,0)))
> df
gr y_value
1 1 0
2 1 0
3 1 10
4 1 8
5 1 8
6 1 6
7 1 0
8 2 10
9 2 10
10 2 10
11 2 8
12 2 7
13 2 6
14 2 2
15 2 0
16 2 0
17 3 8
18 3 5
19 3 8
20 3 7
21 3 6
22 3 2
23 3 10
24 3 10
25 3 8
26 3 7
27 3 0
它猜测这样的方法应该可以工作,但是无法弄清楚如何将其实现为 dplyr
It guess something like this should work but cannot figured out how to implement this to dplyr
count <- function(y,gr){
if (any(y==10)&(gr==1)) {
*
*
*
if (any(y==10)&(gr==2))
*
*
*
*
}
}
df%>%
library(dplyr)
df %>%
group_by(gr) %>%
do(data.frame(.,count_rows=count(y_value,gr)))
预期产量
> df
gr y_value sum nrow
1 1 0 22 4
2 1 0 22 4
3 1 10 22 4
4 1 8 22 4
5 1 8 22 4
6 1 6 22 4
7 1 0 22 4
8 2 10 23 6
9 2 10 23 6
10 2 10 23 6
11 2 8 23 6
12 2 7 23 6
13 2 6 23 6
14 2 2 23 6
15 2 0 23 6
16 2 0 23 6
17 3 8 28 6
18 3 5 28 6
19 3 7 28 6
20 3 6 28 6
21 3 2 28 6
22 3 10 28 6
23 3 10 28 6
24 3 8 28 6
25 3 7 28 6
26 3 0 28 6
推荐答案
希望这会有所帮助!
(编辑说明:在OP更新其原始要求后修改了代码)
Hope this helps!
(Edit note: modified code after OP updated his original requirement)
#sample data - I slightly changed sample data (replaced 0 by 10 in 2nd row) for group 1 to satisfy your condition
df <- data.frame(gr=rep(c(1,2,3),c(7,9,11)),
y_value=c(c(0,10,10,8,8,6,0),c(10,10,10,8,7,6,2,0,0), c(8,5,8,7,6,2,10,10,8,7,0)))
library(dplyr)
df_temp <- df %>%
group_by(gr) %>%
mutate(rows_to_aggregate=cumsum(y_value==10)) %>%
filter(ifelse(gr==1, rows_to_aggregate !=0, ifelse(gr==2, rows_to_aggregate ==0 | y_value==10, rows_to_aggregate ==0))) %>%
filter(ifelse(gr==1, row_number(gr) != 1, ifelse(gr==2, row_number(gr) != n(), rows_to_aggregate ==0))) %>%
mutate(nrow=n(), sum=sum(y_value)) %>%
select(gr,sum,nrow) %>%
distinct()
#final output
df<- left_join(df,df_temp, by='gr')
这篇关于基于组中的条件汇总行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!