tidyverse中的累积聚集 [英] Cumulative aggregates within tidyverse

查看:75
本文介绍了tidyverse中的累积聚集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个 tibble (或 data.table ),它由两列组成:

say I have a tibble (or data.table) which consists of two columns:

a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1))

此外,我有一个名为 myfun 的函数,该函数将任意长度的数字矢量作为输入并返回一个数字.例如,您可以将 myfun 视为标准偏差.

Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation.

现在,我想在我的 tibble 上创建第三列(称为结果),其中包含 myfun 的输出,这些输出应用于根据id进行累积和分组的val.例如,结果的第一项应包含 mfun(val [1]).第二个条目应包含 myfun(val [1:2]),依此类推.我想授予myfun的累积版本.

Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped with respect to id. For example, the first entry of result should contain mfun(val[1]). The second entry should contain myfun(val[1:2]), and so on. I would like to implent a cumulated version of myfun.

当然,在 tidyverse 之外,有很多简单的解决方案可以使用循环,而不能使用循环.但是我会对 tidyverse data.table 框架内的解决方案感兴趣.

Of course there a lot of easy solutions outside the tidyverse using loops and what not. But I would be interested in a solution within the tidyverse or within the data.table frame work.

感谢您的帮助.

推荐答案

您可以这样做:

library(tidyverse)

a %>% 
  group_by(id) %>% 
  mutate(y = map_dbl(seq_along(val),~sd(val[1:.x]))) %>%
  ungroup

# # A tibble: 12 x 3
#       id   val         y
#    <chr> <dbl>     <dbl>
#  1     A     1        NA
#  2     A     0 0.7071068
#  3     A     0 0.5773503
#  4     A     1 0.5773503
#  5     A     0 0.5477226
#  6     A     1 0.5477226
#  7     B     0        NA
#  8     B     0 0.0000000
#  9     B     0 0.0000000
# 10     B     1 0.5000000
# 11     B     1 0.5477226
# 12     B     1 0.5477226

说明

我们首先经常使用 tidyverse 链进行分组,然后使用 mutate ,而不是 summaryize ,因为我们希望保持相同的未聚合状态行.

We first group like often with tidyverse chains, then we use mutate, and not summarize, as we want to keep the same unaggregated rows.

函数 map_dbl 在这里用于循环最终索引的向量.此处两个组的 seq_along(val)均为 1:6 .

The function map_dbl is here used to loop on a vector of final indices. seq_along(val) will be 1:6 for both groups here.

使用地图族中的函数,我们可以使用表示法,它将假定该函数的第一个参数名为 .x .

Using functions from the map family we can use the ~ notation, which will assume the first parameter of the function is named .x.

通过这些索引,我们首先计算出 sd(val [1]),即 sd(val [1]),即 NA ,然后是 sd(val [1:2])等...

Looping through these indices we compute first sd(val[1:1]) which is sd(val[1]) which is NA, then sd(val[1:2]) etc...

map_dbl 根据设计返回一个 doubles 的向量,这些向量堆叠在 y 列中.

map_dbl returns by design a vector of doubles, and these are stacked in the y column.

这篇关于tidyverse中的累积聚集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆