dplyr mutate在列中最多找到n个下一个值 [英] dplyr mutate find max of n next values in column
问题描述
给出以下标记:
library(tidyverse)
set.seed(1)
my_tbl = tibble(x = rep(words[1:5], 50) %>% sort(),
y = 1:250,
z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T))
我正在尝试创建一个新列
,它将填充z列中接下来的3个值的最大值
i’m trying to create a new column which will populate the max value of the next 3 values in column z
例如
第1行的max_3_next应该是84.5(第4行)
for row 1 max_3_next should be 84.5 (of row 4)
第5行的max_3_next应该是86.7(第7行)
for row 5 max_3_next should be 86.7 (of row 7)
这是我要尝试的操作:
my_tbl %>%
mutate(max_next_3 = max(.$z[(y + 1):(y + 3)]))
> my_tbl %>%
+ mutate(max_3_next = max(.$z[(y + 1):(y + 3)]))
# A tibble: 250 x 4
x y z max_3_next
<chr> <int> <dbl> <dbl>
1 a 1 45.9 84.5
2 a 2 52.3 84.5
3 a 3 64.4 84.5
4 a 4 84.5 84.5
5 a 5 42.1 84.5
6 a 6 83.9 84.5
7 a 7 86.7 84.5
8 a 8 69.7 84.5
9 a 9 67.8 84.5
10 a 10 33.7 84.5
# ... with 240 more rows
Warning messages:
1: In (y + 1):(y + 3) :
numerical expression has 250 elements: only the first used
2: In (y + 1):(y + 3) :
numerical expression has 250 elements: only the first used
我收到上述警告
如何更改代码以实现所需的结果?
How can I change the code to achieve the desired result?
我更喜欢使用单层解决方案
,但是我也很乐意学习其他解决方案,因为性能是问题
,因为原始数据集可能有1 M〜行
My preference is for a dplyer solution But i’ll be happy to learn other solutions alongside as well since performance is an issue since the original dataset may have 1 M ~ rows
感谢
拉斐尔
推荐答案
我们可以在 zoo
库中使用 rollmax
align = left
,以指示当前观察值和以下两个观察值的窗口
We can use rollmax
from zoo
library with align="left"
, to instruct the window from the current observation along with the following two observations
library(zoo)
my_tbl %>%
mutate(max_3_next = rollmax(z,3, fill = NA, align = "left"))
# A tibble: 250 x 4
x y z max_3_next
<chr> <int> <dbl> <dbl>
1 a 1 45.9 64.4
2 a 2 52.3 84.5
3 a 3 64.4 84.5
4 a 4 84.5 84.5
5 a 5 42.1 86.7
6 a 6 83.9 86.7
7 a 7 86.7 86.7
8 a 8 69.7 69.7
9 a 9 67.8 67.8
10 a 10 33.7 42.3
# ... with 240 more rows
对不起,我认为我误会了OP。因此,这是正确的解决方案,灵感来自 Joshua Ulrich ,在此问题-我希望。我会保留先前的答案,以防将来的读者需要。
Sorry, I believe that I misunderstand the OP correctly. So here is the correct solution -inspired from Joshua Ulrich answer's at this question- I hope. I will keep the previous answer just in case needed by future readers.
my_tbl %>%
mutate(max_3_next = rollapply(z, list((1:3)), max, fill=NA, align = "left", partial=TRUE))
# A tibble: 250 x 4
x y z max_3_next
<chr> <int> <dbl> <dbl>
1 a 1 45.9 84.5
2 a 2 52.3 84.5
3 a 3 64.4 84.5
4 a 4 84.5 86.7
5 a 5 42.1 86.7
6 a 6 83.9 86.7
7 a 7 86.7 69.7
8 a 8 69.7 67.8
9 a 9 67.8 42.3
10 a 10 33.7 71.2
# ... with 240 more rows
这篇关于dplyr mutate在列中最多找到n个下一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!