使用map()和mutate()遍历多个变量和参数 [英] Looping across multiple variables and parameters using map() and mutate()
问题描述
我在弄清楚如何有效地映射tbl中的多个参数和变量以生成新变量时遇到了麻烦。
I'm having trouble figuring out how to effective map across multiple parameters and variables within a tbl to generate new variables.
在真实版本中,我基本上只有一个数学函数来生成中央估计,并且我需要运行一系列不同参数的灵敏度测试。我正在尝试找出如何在tidyverse中执行此操作。看起来map()和mutate()可以解决此问题,但我遇到了麻烦。
In the "real" version, I basically have one mathematical function generating a central estimate, and I need to run a whole series of sensitivity tests varying different parameters. I'm trying to figure out how to do this within the tidyverse. It looks like map() and mutate() are the answers to this, but I'm having trouble.
# building the practice dataset
pracdf <- tibble(ID = letters,
p = runif(26, 100, 1000),
med.a = runif(26),
med.b = runif(26),
c = runif(26))
pracdf <- pracdf %>%
mutate(low.a = med.a * 0.8,
low.b = med.b * 0.8,
high.a = med.a * 1.2,
high.b = med.b * 1.2)
# this generates a few low/med/high values for variables
# the function
pracdf <- pracdf %>% mutate(d = p * med.a * med.b * c)
# works as expected. Now can I loop it with dynamic variable names?
f1 <- function(df, var.a) {
var.a <- enquo(var.a)
print(var.a)
d.name <- paste0("d.", quo_name(var.a))
print(d.name)
df %>% mutate(!!d.name := p * (!!var.a) * c)
}
pracdf2 <- f1(pracdf, med.a)
# works great! Eventually I want to loop through low, med, high. Start with a loop of 1
pracdf3 <- map(list(med.a), f1, df = pracdf)
# loop crashes spectacularly
pracdf3 <- map(list(med.a), ~f1, df = pracdf)
# failure
pracdf3 <- map(med.a, ~f1, df = pracdf)
# what am I doing with my life
推荐答案
我认为其中一个问题这个任务很难,因为目前的设置可能不是很整洁。例如。 low.a
, low.b
, med.a
I think one of the issues making this task difficult is the current set up might not be very "tidy". E.g. low.a
, low.b
, med.a
etc appear to be examples of what I understand to be 'untidy' columns.
下面是一种可能的方法(我很确定可以改进),但是这种方法并不可行。完全不要使用for循环或自定义函数。关键思想是采用初始的 pracdf
并扩展现有行,以便每个级别(即低,中和高)都有一行。这样一来,我们就可以一步一步计算 d
,而没有用于低,中和高的for循环。
Below is one possible approach (which I am fairly sure can probably be improved) which doesn't use a for loop or custom function at all. The key idea is to take the initial pracdf
and expand the existing rows so there is one row for each "level" (i.e., low, med, and high). Doing this lets us calculate d
in a single step with no for loops for low, med, and high.
(为提高可读性而编辑,其中包含 Jens Leerssen的建议)
(Edited for readability and to include Jens Leerssen's suggestions)
library(dplyr)
library(tidyr)
set.seed(123)
pracdf <- tibble(ID = letters,
p = runif(26, 100, 1000),
a = runif(26),
b = runif(26),
c = runif(26))
levdf <- tibble(level = c("low", "med", "high"),
level_val = c(0.8, 1.0, 1.2))
tidy_df <- pracdf %>% merge(levdf) %>%
mutate(d = p * (level_val * a) * (level_val * b) * c) %>%
select(-level_val) %>% arrange(ID) %>% as_tibble()
tidy_df
#> # A tibble: 78 x 7
#> ID p a b c level d
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 a 358.8198 0.5440660 0.7989248 0.3517979 low 35.116168
#> 2 a 358.8198 0.5440660 0.7989248 0.3517979 med 54.869013
#> 3 a 358.8198 0.5440660 0.7989248 0.3517979 high 79.011379
#> 4 b 809.4746 0.5941420 0.1218993 0.1111354 low 4.169914
#> 5 b 809.4746 0.5941420 0.1218993 0.1111354 med 6.515490
#> 6 b 809.4746 0.5941420 0.1218993 0.1111354 high 9.382306
#> 7 c 468.0792 0.2891597 0.5609480 0.2436195 low 11.837821
#> 8 c 468.0792 0.2891597 0.5609480 0.2436195 med 18.496595
#> 9 c 468.0792 0.2891597 0.5609480 0.2436195 high 26.635096
#> 10 d 894.7157 0.1471136 0.2065314 0.6680556 low 11.622957
#> # ... with 68 more rows
但是,上面的结果可能不是您想要的格式最终数据输入。但是我们可以通过使用 tidyr :: gather
收集和传播 tidy_df
来解决此问题。和 tidyr :: spread
。
However, the result above might not be the format you want the final data in. But we can take care of this by doing some gathering and spreading of tidy_df
using tidyr::gather
and tidyr::spread
.
tidy_df %>%
gather(variable, value, a, b, d) %>%
unite(level_variable, level, variable) %>%
spread(level_variable, value)
#> # A tibble: 26 x 12
#> ID p c high_a high_b high_d low_a
#> * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 358.8198 0.3517979 0.54406602 0.79892485 79.011379 0.54406602
#> 2 b 809.4746 0.1111354 0.59414202 0.12189926 9.382306 0.59414202
#> 3 c 468.0792 0.2436195 0.28915974 0.56094798 26.635096 0.28915974
#> 4 d 894.7157 0.6680556 0.14711365 0.20653139 26.151654 0.14711365
#> 5 e 946.4206 0.4176468 0.96302423 0.12753165 69.905442 0.96302423
#> 6 f 141.0008 0.7881958 0.90229905 0.75330786 108.778072 0.90229905
#> 7 g 575.2949 0.1028646 0.69070528 0.89504536 52.681362 0.69070528
#> 8 h 903.1771 0.4348927 0.79546742 0.37446278 168.480110 0.79546742
#> 9 i 596.2915 0.9849570 0.02461368 0.66511519 13.845603 0.02461368
#> 10 j 510.9533 0.8930511 0.47779597 0.09484066 29.775361 0.47779597
#> # ... with 16 more rows, and 5 more variables: low_b <dbl>, low_d <dbl>,
#> # med_a <dbl>, med_b <dbl>, med_d <dbl>
这篇关于使用map()和mutate()遍历多个变量和参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!