tidyverse中的累积聚集 [英] Cumulative aggregates within tidyverse
问题描述
说我有一个 tibble
(或 data.table
),它由两列组成:
say I have a tibble
(or data.table
) which consists of two columns:
a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1))
此外,我有一个名为 myfun
的函数,该函数将任意长度的数字矢量作为输入并返回一个数字.例如,您可以将 myfun
视为标准偏差.
Furthermore I have a function called myfun
which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun
as being the standard deviation.
现在,我想在我的 tibble
上创建第三列(称为结果),其中包含 myfun
的输出,这些输出应用于根据id进行累积和分组的val.例如,结果的第一项应包含 mfun(val [1])
.第二个条目应包含 myfun(val [1:2])
,依此类推.我想授予myfun的累积版本.
Now I would like to create a third column to my tibble
(called result) which contains the outputs of myfun
applied to val cumulated and grouped with respect to id.
For example, the first entry of result should contain mfun(val[1])
.
The second entry should contain myfun(val[1:2])
, and so on.
I would like to implent a cumulated version of myfun.
当然,在 tidyverse
之外,有很多简单的解决方案可以使用循环,而不能使用循环.但是我会对 tidyverse
或 data.table
框架内的解决方案感兴趣.
Of course there a lot of easy solutions outside the tidyverse
using loops and what not.
But I would be interested in a solution within the tidyverse
or within the data.table
frame work.
感谢您的帮助.
推荐答案
您可以这样做:
library(tidyverse)
a %>%
group_by(id) %>%
mutate(y = map_dbl(seq_along(val),~sd(val[1:.x]))) %>%
ungroup
# # A tibble: 12 x 3
# id val y
# <chr> <dbl> <dbl>
# 1 A 1 NA
# 2 A 0 0.7071068
# 3 A 0 0.5773503
# 4 A 1 0.5773503
# 5 A 0 0.5477226
# 6 A 1 0.5477226
# 7 B 0 NA
# 8 B 0 0.0000000
# 9 B 0 0.0000000
# 10 B 1 0.5000000
# 11 B 1 0.5477226
# 12 B 1 0.5477226
说明
我们首先经常使用 tidyverse
链进行分组,然后使用 mutate
,而不是 summaryize
,因为我们希望保持相同的未聚合状态行.
We first group like often with tidyverse
chains, then we use mutate
, and not summarize
, as we want to keep the same unaggregated rows.
函数 map_dbl
在这里用于循环最终索引的向量.此处两个组的 seq_along(val)
均为 1:6
.
The function map_dbl
is here used to loop on a vector of final indices. seq_along(val)
will be 1:6
for both groups here.
使用地图族中的函数,我们可以使用〜
表示法,它将假定该函数的第一个参数名为 .x
.
Using functions from the map family we can use the ~
notation, which will assume the first parameter of the function is named .x
.
通过这些索引,我们首先计算出 sd(val [1])
,即 sd(val [1])
,即 NA 代码>,然后是
sd(val [1:2])
等...
Looping through these indices we compute first sd(val[1:1])
which is sd(val[1])
which is NA
, then sd(val[1:2])
etc...
map_dbl
根据设计返回一个 doubles
的向量,这些向量堆叠在 y
列中.
map_dbl
returns by design a vector of doubles
, and these are stacked in the y
column.
这篇关于tidyverse中的累积聚集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!