返回来自不同组的最后两行或前两行的平均值(由变量表示) [英] Return an average of last or first two rows from a different group (indicated by a variable)
本文介绍了返回来自不同组的最后两行或前两行的平均值(由变量表示)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是这个问题的后续.使用如下数据:
This is a follow-up to this question. With a data like below:
data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L), new_seq = c(2, 2,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
2, 2, 2, 2, NA, NA, NA, NA, NA, 4, 4, 4, 4, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 6, 6, 6, 6, 6, NA, NA, 8, 8, 8, NA, NA, NA), value = c(2L,
0L, 0L, 3L, 0L, 5L, 5L, 3L, 0L, 3L, 2L, 3L, 2L, 3L, 4L, 1L, 0L,
0L, 0L, 1L, 1L, 0L, 2L, 5L, 3L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 3L,
5L, 3L, 1L, 1L, 1L, 0L, 1L, 0L, 4L, 3L, 0L, 3L, 1L, 3L, 0L, 0L,
1L, 0L, 0L, 3L, 4L, 5L, 3L, 5L, 3L, 5L, 0L, 1L, 1L, 3L, 2L, 1L,
0L, 0L, 0L, 0L, 5L, 1L, 1L, 0L, 4L, 1L, 5L, 0L, 3L, 1L, 2L, 1L,
0L, 3L, 0L, 1L, 1L, 3L, 0L, 1L, 1L, 2L, 2L, 1L, 0L, 4L, 0L, 0L,
3L, 0L, 0L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl",
"data.frame"))
对于 new_seq
的每个值,不是 NA
我需要计算 2
来自 中各个组的观测值的平均值seq
(new_seq
的值指的是seq
的值).问题在于:
for every value of new_seq
, which is not NA
I need to calculate a mean of 2
observations from respective group in seq
(value of new_seq
refers to a value of seq
). The issue is that:
- 对于那些行,其中
new_seq
指的是seq
的值,它出现在(例如行1:2
)之后,它应该是来自各自组的2
FIRST 行的平均值, - 对于那些
new_seq
指的是seq
的值的那些行,它出现在它应该是来自相应组的2
LAST 行的平均值之前
- for those rows, where
new_seq
refers to a value ofseq
which appears after (rows1:2
in an example) it should be a mean of2
FIRST rows from respective group, - for those rows where
new_seq
refers to a value ofseq
which appears before it should be a mean of2
LAST rows from respective group
@Z.Lin 为第二种情况提供了很好的解决方案,但是如何调整它来处理这两种情况?或者,tidyverse
是否还有其他解决方案?
@Z.Lin provided excellent solution for the second case, but how it can be tweaked to handle both cases? Or maybe is there another solution with tidyverse
?
推荐答案
我想我明白了,所以我为任何从搜索而来的人发布了一个答案.
I think I got it, so I post an answer for the anybody who'll come here from search.
lookup_backwards <- data %>%
group_by(seq) %>%
mutate(rank = seq(n(), 1)) %>%
filter(rank <= 2) %>%
summarise(backwards = mean(value)) %>%
ungroup()
lookup_forwards <- data %>%
group_by(seq) %>%
mutate(rank = seq(1, n())) %>%
filter(rank <= 2) %>%
summarise(forwards = mean(value)) %>%
ungroup()
data %>%
left_join(lookup_backwards, by = c('new_seq' = 'seq')) %>%
left_join(lookup_forwards, by = c('new_seq' = 'seq')) %>%
replace_na(list(backwards = 0, forwards = 0)) %>%
mutate(new_column = ifelse(new_seq > seq, forwards, backwards))
这篇关于返回来自不同组的最后两行或前两行的平均值(由变量表示)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文