使用purrr和预定义函数添加多个输出变量 [英] Add multiple output variables using purrr and a predefined function
问题描述
采用以下简单的数据集和功能(代表更复杂的问题):
Take this simple dataset and function (representative of more complex problems):
x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n
使用基本R的 Map
,我可以这样做来以矢量化的方式添加2个新列:
Using base R's Map
I could do this to add 2 new columns in a vectorised fashion:
ns <- 1:2
x[paste0("new",seq_along(ns))] <- Map(mult, x["a"], x["b"], n=ns)
x
# a b new1 new2
#1 1 2 3 6
#2 2 3 5 10
#3 3 4 7 14
通过 pmap
进行的
purrr
尝试通过列表输出结束:
purrr
attempt via pmap
gets close with a list output:
library(purrr)
library(dplyr)
x %>% select(a,b) %>% pmap(mult, n=1:2)
#[[1]]
#[1] 3 6
#
#[[2]]
#[1] 5 10
#
#[[3]]
#[1] 7 14
我在这里尝试使用 pmap_dfr
等尝试将其映射回新列时似乎都出错了.
My attempts from here with pmap_dfr
etc all seem to error out in trying to map this back to new columns.
如何最终再生成2个与我当前的"new1"/"new2"
匹配的变量?我确定这里有一个简单的咒语,但是我显然忽略了它或使用了错误的 * map *
函数.
How do I end up making 2 further variables which match my current "new1"/"new2"
? I'm sure there is a simple incantation, but I'm clearly overlooking it or using the wrong *map*
function.
There is some useful discussion here - How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs - but it seems overly hacky and inflexible for what I imagined was a simple problem.
推荐答案
我发现的最佳方法(仍然不是很优雅)是通过管道输入 bind_cols
.为了使 pmap_dfr
正常工作,该函数应返回一个命名列表(它可以是也可以不是数据框):
The best approach I've found (which is still not terribly elegant) is to pipe into bind_cols
. To get pmap_dfr
to work correctly, the function should return a named list (which may or may not be a data frame):
library(tidyverse)
x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) as.list(set_names((a + b) * n, paste0('new', n)))
x %>% bind_cols(pmap_dfr(., mult, n = 1:2))
#> a b new1 new2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
为避免更改 mult
的定义,可以将其包装在匿名函数中:
To avoid changing the definition of mult
, you can wrap it in an anonymous function:
mult <- function(a,b,n) (a + b) * n
x %>% bind_cols(pmap_dfr(
.,
~as.list(set_names(
mult(...),
paste0('new', 1:2)
)),
n = 1:2
))
#> a b new1 new2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
在这种特殊情况下,实际上并不需要遍历行,因为您可以对 x
中的输入进行矢量化,然后对 n
进行遍历.优点是通常 n> p ,因此迭代次数将[可能大大减少].显然,这种方法是否可行取决于函数可以接受矢量参数的参数.
In this particular case, it's not actually necessary to iterate over rows, though, because you can vectorize the inputs from x
and instead iterate over n
. The advantage is that usually n > p, so the number of iterations will be [potentially much] lower. To be clear, whether such an approach is possible depends on for which parameters the function can accept vector arguments.
mult
.最简单的方法是显式地传递它们:
mult
still needs to be called on the variables of x
. The simplest way to do this is to pass them explicitly:
x %>% bind_cols(map_dfc(1:2, ~mult(x$a, x$b, .x)))
#> a b V1 V2
#> 1 1 2 3 6
#> 2 2 3 5 10
#> 3 3 4 7 14
...但是这失去了 pmap
的好处,即命名变量将自动传递给正确的参数.您可以使用 purrr :: lift
找回它,这是一个副词,它会更改函数的域,以便通过将列表包装在 do.call
中来接受列表.可以在 x
上调用返回的函数,并为该迭代调用 n
的值:
...but this loses the benefit of pmap
that named variables will automatically get passed to the correct parameter. You can get that back by using purrr::lift
, which is an adverb that changes the domain of a function so it accepts a list by wrapping it in do.call
. The returned function can be called on x
and the value of n
for that iteration:
x %>% bind_cols(map_dfc(1:2, ~lift(mult)(x, n = .x)))
这等效于
x %>% bind_cols(map_dfc(1:2, ~invoke(mult, x, n = .x)))
但前者的优点是它返回的函数可以部分地 x
应用在 x
上,因此只具有 n
参数左边,因此不需要显式引用 x
,因此管道效果更好:
but the advantage of the former is that it returns a function which can be partial
ly applied on x
so it only has an n
parameter left, and thus requires no explicit references to x
and so pipes better:
x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .)))
所有人都返回同一件事.如果愿意,可以使用%>%set_names(〜sub('^ V(\\ d +)$','new \\ 1',.x))
固定名称.
All return the same thing. Names can be fixed after the fact with %>% set_names(~sub('^V(\\d+)$', 'new\\1', .x))
, if you like.
这篇关于使用purrr和预定义函数添加多个输出变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!