返回来自不同组的最后两行或前两行的平均值(由变量表示) [英] Return an average of last or first two rows from a different group (indicated by a variable)

查看:52
本文介绍了返回来自不同组的最后两行或前两行的平均值(由变量表示)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是这个问题的后续.使用如下数据:

This is a follow-up to this question. With a data like below:

data <- structure(list(seq = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 
7L, 7L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L), new_seq = c(2, 2, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
2, 2, 2, 2, NA, NA, NA, NA, NA, 4, 4, 4, 4, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 6, 6, 6, 6, 6, NA, NA, 8, 8, 8, NA, NA, NA), value = c(2L, 
0L, 0L, 3L, 0L, 5L, 5L, 3L, 0L, 3L, 2L, 3L, 2L, 3L, 4L, 1L, 0L, 
0L, 0L, 1L, 1L, 0L, 2L, 5L, 3L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 3L, 
5L, 3L, 1L, 1L, 1L, 0L, 1L, 0L, 4L, 3L, 0L, 3L, 1L, 3L, 0L, 0L, 
1L, 0L, 0L, 3L, 4L, 5L, 3L, 5L, 3L, 5L, 0L, 1L, 1L, 3L, 2L, 1L, 
0L, 0L, 0L, 0L, 5L, 1L, 1L, 0L, 4L, 1L, 5L, 0L, 3L, 1L, 2L, 1L, 
0L, 3L, 0L, 1L, 1L, 3L, 0L, 1L, 1L, 2L, 2L, 1L, 0L, 4L, 0L, 0L, 
3L, 0L, 0L)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", 
"data.frame"))

对于 new_seq 的每个值,不是 NA 我需要计算 2 来自 中各个组的观测值的平均值seq(new_seq 的值指的是seq 的值).问题在于:

for every value of new_seq, which is not NA I need to calculate a mean of 2 observations from respective group in seq (value of new_seq refers to a value of seq). The issue is that:

  • 对于那些行,其中 new_seq 指的是 seq 的值,它出现在(例如行 1:2)之后,它应该是来自各自组的 2 FIRST 行的平均值,
  • 对于那些 new_seq 指的是 seq 的值的那些行,它出现在它应该是来自相应组的 2 LAST 行的平均值之前
  • for those rows, where new_seq refers to a value of seq which appears after (rows 1:2 in an example) it should be a mean of 2 FIRST rows from respective group,
  • for those rows where new_seq refers to a value of seq which appears before it should be a mean of 2 LAST rows from respective group

@Z.Lin 为第二种情况提供了很好的解决方案,但是如何调整它来处理这两种情况?或者,tidyverse 是否还有其他解决方案?

@Z.Lin provided excellent solution for the second case, but how it can be tweaked to handle both cases? Or maybe is there another solution with tidyverse?

推荐答案

我想我明白了,所以我为任何从搜索而来的人发布了一个答案.

I think I got it, so I post an answer for the anybody who'll come here from search.

lookup_backwards <- data %>%
  group_by(seq) %>%
  mutate(rank = seq(n(), 1)) %>% 
  filter(rank <= 2) %>%
  summarise(backwards = mean(value)) %>%
  ungroup()

lookup_forwards <- data %>% 
  group_by(seq) %>% 
  mutate(rank = seq(1, n())) %>% 
  filter(rank <= 2) %>% 
  summarise(forwards = mean(value)) %>% 
  ungroup()

data %>% 
  left_join(lookup_backwards, by = c('new_seq' = 'seq')) %>% 
  left_join(lookup_forwards, by = c('new_seq' = 'seq')) %>% 
  replace_na(list(backwards = 0, forwards = 0)) %>% 
  mutate(new_column = ifelse(new_seq > seq, forwards, backwards))

这篇关于返回来自不同组的最后两行或前两行的平均值(由变量表示)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆