使用map()和mutate()遍历多个变量和参数 [英] Looping across multiple variables and parameters using map() and mutate()

查看:126
本文介绍了使用map()和mutate()遍历多个变量和参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在弄清楚如何有效地映射tbl中的多个参数和变量以生成新变量时遇到了麻烦。

I'm having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.

在真实版本中,我基本上只有一个数学函数来生成中央估计,并且我需要运行一系列不同参数的灵敏度测试。我正在尝试找出如何在tidyverse中执行此操作。看起来map()和mutate()可以解决此问题,但我遇到了麻烦。

In the "real" version, I basically have one mathematical function generating a central estimate, and I need to run a whole series of sensitivity tests varying different parameters. I'm trying to figure out how to do this within the tidyverse. It looks like map() and mutate() are the answers to this, but I'm having trouble.

    # building the practice dataset
    pracdf <- tibble(ID = letters,
             p = runif(26, 100, 1000),
             med.a = runif(26),
             med.b = runif(26),
             c = runif(26))

    pracdf <- pracdf %>%
      mutate(low.a = med.a * 0.8,
             low.b = med.b * 0.8,
             high.a = med.a * 1.2,
             high.b = med.b * 1.2)
    # this generates a few low/med/high values for variables


    # the function
    pracdf <- pracdf %>% mutate(d = p * med.a * med.b * c)
    # works as expected. Now can I loop it with dynamic variable names?


    f1 <- function(df, var.a) {
      var.a <- enquo(var.a)
      print(var.a)
      d.name <- paste0("d.", quo_name(var.a))
      print(d.name)

      df %>% mutate(!!d.name := p * (!!var.a) * c)
    }

    pracdf2 <- f1(pracdf, med.a)
    # works great! Eventually I want to loop through low, med, high. Start with a loop of 1

    pracdf3 <- map(list(med.a), f1, df = pracdf)
    # loop crashes spectacularly
    pracdf3 <- map(list(med.a), ~f1, df = pracdf)
    # failure
    pracdf3 <- map(med.a, ~f1, df = pracdf)
    # what am I doing with my life


推荐答案

我认为其中一个问题这个任务很难,因为目前的设置可能不是很整洁。例如。 low.a low.b med.a

I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g. low.a, low.b, med.a etc appear to be examples of what I understand to be 'untidy' columns.

下面是一种可能的方法(我很确定可以改进),但是这种方法并不可行。完全不要使用for循环或自定义函数。关键思想是采用初始的 pracdf 并扩展现有行,以便每个级别(即低,中和高)都有一行。这样一来,我们就可以一步一步计算 d ,而没有用于低,中和高的for循环。

Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial pracdf and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculate d in a single step with no for loops for low, med, and high.

(为提高可读性而编辑,其中包含 Jens Leerssen的建议)


(Edited for readability and to include Jens Leerssen's suggestions)

library(dplyr)
library(tidyr)
set.seed(123)
pracdf <- tibble(ID = letters,
                 p = runif(26, 100, 1000),
                 a = runif(26),
                 b = runif(26),
                 c = runif(26))

levdf <- tibble(level = c("low", "med", "high"),
                level_val = c(0.8, 1.0, 1.2))

tidy_df <- pracdf %>% merge(levdf) %>%
  mutate(d = p * (level_val * a) * (level_val * b) * c) %>%
  select(-level_val) %>% arrange(ID) %>% as_tibble()

tidy_df

#> # A tibble: 78 x 7
#>       ID        p         a         b         c level         d
#>    <chr>    <dbl>     <dbl>     <dbl>     <dbl> <chr>     <dbl>
#>  1     a 358.8198 0.5440660 0.7989248 0.3517979   low 35.116168
#>  2     a 358.8198 0.5440660 0.7989248 0.3517979   med 54.869013
#>  3     a 358.8198 0.5440660 0.7989248 0.3517979  high 79.011379
#>  4     b 809.4746 0.5941420 0.1218993 0.1111354   low  4.169914
#>  5     b 809.4746 0.5941420 0.1218993 0.1111354   med  6.515490
#>  6     b 809.4746 0.5941420 0.1218993 0.1111354  high  9.382306
#>  7     c 468.0792 0.2891597 0.5609480 0.2436195   low 11.837821
#>  8     c 468.0792 0.2891597 0.5609480 0.2436195   med 18.496595
#>  9     c 468.0792 0.2891597 0.5609480 0.2436195  high 26.635096
#> 10     d 894.7157 0.1471136 0.2065314 0.6680556   low 11.622957
#> # ... with 68 more rows

但是,上面的结果可能不是您想要的格式最终数据输入。但是我们可以通过使用 tidyr :: gather 收集和传播 tidy_df 来解决此问题。和 tidyr :: spread


However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of tidy_df using tidyr::gather and tidyr::spread.

tidy_df %>%
  gather(variable, value, a, b, d) %>%
  unite(level_variable, level, variable) %>%
  spread(level_variable, value)

#> # A tibble: 26 x 12
#>       ID        p         c     high_a     high_b     high_d      low_a
#>  * <chr>    <dbl>     <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
#>  1     a 358.8198 0.3517979 0.54406602 0.79892485  79.011379 0.54406602
#>  2     b 809.4746 0.1111354 0.59414202 0.12189926   9.382306 0.59414202
#>  3     c 468.0792 0.2436195 0.28915974 0.56094798  26.635096 0.28915974
#>  4     d 894.7157 0.6680556 0.14711365 0.20653139  26.151654 0.14711365
#>  5     e 946.4206 0.4176468 0.96302423 0.12753165  69.905442 0.96302423
#>  6     f 141.0008 0.7881958 0.90229905 0.75330786 108.778072 0.90229905
#>  7     g 575.2949 0.1028646 0.69070528 0.89504536  52.681362 0.69070528
#>  8     h 903.1771 0.4348927 0.79546742 0.37446278 168.480110 0.79546742
#>  9     i 596.2915 0.9849570 0.02461368 0.66511519  13.845603 0.02461368
#> 10     j 510.9533 0.8930511 0.47779597 0.09484066  29.775361 0.47779597
#> # ... with 16 more rows, and 5 more variables: low_b <dbl>, low_d <dbl>,
#> #   med_a <dbl>, med_b <dbl>, med_d <dbl>

这篇关于使用map()和mutate()遍历多个变量和参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆