使用purrr和预定义函数添加多个输出变量 [英] Add multiple output variables using purrr and a predefined function

查看:65
本文介绍了使用purrr和预定义函数添加多个输出变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

采用以下简单的数据集和功能(代表更复杂的问题):

Take this simple dataset and function (representative of more complex problems):

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) (a + b) * n

使用基本R的 Map ,我可以这样做来以矢量化的方式添加2个新列:

Using base R's Map I could do this to add 2 new columns in a vectorised fashion:

ns <- 1:2
x[paste0("new",seq_along(ns))] <- Map(mult, x["a"], x["b"], n=ns)
x
#  a b new1 new2
#1 1 2    3    6
#2 2 3    5   10
#3 3 4    7   14

通过 pmap 进行的

purrr 尝试通过列表输出结束:

purrr attempt via pmap gets close with a list output:

library(purrr)
library(dplyr)
x %>% select(a,b) %>% pmap(mult, n=1:2)
#[[1]]
#[1] 3 6
#
#[[2]]
#[1]  5 10
#
#[[3]]
#[1]  7 14

我在这里尝试使用 pmap_dfr 等尝试将其映射回新列时似乎都出错了.

My attempts from here with pmap_dfr etc all seem to error out in trying to map this back to new columns.

如何最终再生成2个与我当前的"new1"/"new2" 匹配的变量?我确定这里有一个简单的咒语,但是我显然忽略了它或使用了错误的 * map * 函数.

How do I end up making 2 further variables which match my current "new1"/"new2"? I'm sure there is a simple incantation, but I'm clearly overlooking it or using the wrong *map* function.

这里有一些有用的讨论-

There is some useful discussion here - How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs - but it seems overly hacky and inflexible for what I imagined was a simple problem.

推荐答案

我发现的最佳方法(仍然不是很优雅)是通过管道输入 bind_cols .为了使 pmap_dfr 正常工作,该函数应返回一个命名列表(它可以是也可以不是数据框):

The best approach I've found (which is still not terribly elegant) is to pipe into bind_cols. To get pmap_dfr to work correctly, the function should return a named list (which may or may not be a data frame):

library(tidyverse)

x <- data.frame(a = 1:3, b = 2:4)
mult <- function(a,b,n) as.list(set_names((a + b) * n, paste0('new', n)))

x %>% bind_cols(pmap_dfr(., mult, n = 1:2))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14

为避免更改 mult 的定义,可以将其包装在匿名函数中:

To avoid changing the definition of mult, you can wrap it in an anonymous function:

mult <- function(a,b,n) (a + b) * n

x %>% bind_cols(pmap_dfr(
    ., 
    ~as.list(set_names(
        mult(...), 
        paste0('new', 1:2)
    )), 
    n = 1:2
))
#>   a b new1 new2
#> 1 1 2    3    6
#> 2 2 3    5   10
#> 3 3 4    7   14

在这种特殊情况下,实际上并不需要遍历行,因为您可以对 x 中的输入进行矢量化,然后对 n 进行遍历.优点是通常 n> p ,因此迭代次数将[可能大大减少].显然,这种方法是否可行取决于函数可以接受矢量参数的参数.

In this particular case, it's not actually necessary to iterate over rows, though, because you can vectorize the inputs from x and instead iterate over n. The advantage is that usually n > p, so the number of iterations will be [potentially much] lower. To be clear, whether such an approach is possible depends on for which parameters the function can accept vector arguments.

mult .最简单的方法是显式地传递它们:

mult still needs to be called on the variables of x. The simplest way to do this is to pass them explicitly:

x %>% bind_cols(map_dfc(1:2, ~mult(x$a, x$b, .x)))
#>   a b V1 V2
#> 1 1 2  3  6
#> 2 2 3  5 10
#> 3 3 4  7 14

...但是这失去了 pmap 的好处,即命名变量将自动传递给正确的参数.您可以使用 purrr :: lift 找回它,这是一个副词,它会更改函数的域,以便通过将列表包装在 do.call 中来接受列表.可以在 x 上调用返回的函数,并为该迭代调用 n 的值:

...but this loses the benefit of pmap that named variables will automatically get passed to the correct parameter. You can get that back by using purrr::lift, which is an adverb that changes the domain of a function so it accepts a list by wrapping it in do.call. The returned function can be called on x and the value of n for that iteration:

x %>% bind_cols(map_dfc(1:2, ~lift(mult)(x, n = .x)))

这等效于

x %>% bind_cols(map_dfc(1:2, ~invoke(mult, x, n = .x)))

但前者的优点是它返回的函数可以部分地 x 应用在 x 上,因此只具有 n 参数左边,因此不需要显式引用 x ,因此管道效果更好:

but the advantage of the former is that it returns a function which can be partially applied on x so it only has an n parameter left, and thus requires no explicit references to x and so pipes better:

x %>% bind_cols(map_dfc(1:2, partial(lift(mult), .)))

所有人都返回同一件事.如果愿意,可以使用%>%set_names(〜sub('^ V(\\ d +)$','new \\ 1',.x))固定名称.

All return the same thing. Names can be fixed after the fact with %>% set_names(~sub('^V(\\d+)$', 'new\\1', .x)), if you like.

这篇关于使用purrr和预定义函数添加多个输出变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆