dplyr mutate在列中最多找到n个下一个值 [英] dplyr mutate find max of n next values in column

查看：113 发布时间：2020/10/26 5:02:09 r dplyr

本文介绍了dplyr mutate在列中最多找到n个下一个值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出以下标记：

library(tidyverse)

set.seed(1)

my_tbl = tibble(x = rep(words[1:5], 50) %>% sort(),
                y = 1:250,
                z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T))

我正在尝试创建一个新列
，它将填充z列中接下来的3个值的最大值

i’m trying to create a new column which will populate the max value of the next 3 values in column z

例如

第1行的max_3_next应该是84.5（第4行）

for row 1 max_3_next should be 84.5 (of row 4)

第5行的max_3_next应该是86.7（第7行）

for row 5 max_3_next should be 86.7 (of row 7)

这是我要尝试的操作：

my_tbl %>%
  mutate(max_next_3 =  max(.$z[(y + 1):(y + 3)])) 

> my_tbl %>%
+   mutate(max_3_next =  max(.$z[(y + 1):(y + 3)])) 
# A tibble: 250 x 4
   x         y     z max_3_next
   <chr> <int> <dbl>      <dbl>
 1 a         1  45.9       84.5
 2 a         2  52.3       84.5
 3 a         3  64.4       84.5
 4 a         4  84.5       84.5
 5 a         5  42.1       84.5
 6 a         6  83.9       84.5
 7 a         7  86.7       84.5
 8 a         8  69.7       84.5
 9 a         9  67.8       84.5
10 a        10  33.7       84.5
# ... with 240 more rows
Warning messages:
1: In (y + 1):(y + 3) :
  numerical expression has 250 elements: only the first used
2: In (y + 1):(y + 3) :
  numerical expression has 250 elements: only the first used

我收到上述警告

如何更改代码以实现所需的结果？

How can I change the code to achieve the desired result?

我更喜欢使用单层解决方案
，但是我也很乐意学习其他解决方案，因为性能是问题
，因为原始数据集可能有1 M〜行

My preference is for a dplyer solution But i’ll be happy to learn other solutions alongside as well since performance is an issue since the original dataset may have 1 M ~ rows

感谢
拉斐尔

推荐答案

我们可以在 zoo 库中使用 rollmax align = left ，以指示当前观察值和以下两个观察值的窗口

We can use rollmax from zoo library with align="left", to instruct the window from the current observation along with the following two observations

library(zoo)
my_tbl %>%
   mutate(max_3_next = rollmax(z,3, fill = NA, align = "left"))


# A tibble: 250 x 4
    x        y    z     max_3_next
  <chr>    <int> <dbl>    <dbl>
 1 a         1  45.9       64.4
 2 a         2  52.3       84.5
 3 a         3  64.4       84.5
 4 a         4  84.5       84.5
 5 a         5  42.1       86.7
 6 a         6  83.9       86.7
 7 a         7  86.7       86.7
 8 a         8  69.7       69.7
 9 a         9  67.8       67.8
10 a        10  33.7       42.3   
# ... with 240 more rows

对不起，我认为我误会了OP。因此，这是正确的解决方案，灵感来自 Joshua Ulrich ，在此问题-我希望。我会保留先前的答案，以防将来的读者需要。

Sorry, I believe that I misunderstand the OP correctly. So here is the correct solution -inspired from Joshua Ulrich answer's at this question- I hope. I will keep the previous answer just in case needed by future readers.

my_tbl %>% 
       mutate(max_3_next = rollapply(z, list((1:3)), max, fill=NA, align = "left", partial=TRUE))  

  # A tibble: 250 x 4
  x         y     z   max_3_next
  <chr> <int> <dbl> <dbl>
1 a         1  45.9  84.5
2 a         2  52.3  84.5
3 a         3  64.4  84.5
4 a         4  84.5  86.7
5 a         5  42.1  86.7
6 a         6  83.9  86.7
7 a         7  86.7  69.7
8 a         8  69.7  67.8
9 a         9  67.8  42.3
10 a        10  33.7  71.2
 # ... with 240 more rows

这篇关于dplyr mutate在列中最多找到n个下一个值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

dplyr mutate在列中最多找到n个下一个值 [英] dplyr mutate find max of n next values in column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

dplyr mutate在列中最多找到n个下一个值 [英] dplyr mutate find max of n next values in column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭