将具有多变量函数列表变量的数据框应用于具有函数参数的数据框 [英] Apply data frame with list-variable of multivariable functions to a data frame with function arguments

查看:47
本文介绍了将具有多变量函数列表变量的数据框应用于具有函数参数的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个数据框包含我称之为数据"的内容:

图书馆(tidyverse)df_d <- data_frame(key = c("cat", "cat", "dog", "dog"),value_1 = c(1,2,3,4),value_2 = c(2,4,6,8))

这是一个数据框,我打算将其用作函数查找表之类的东西.f 是单变量函数,f2 是多变量函数:

df_f <- data_frame(key = c("cat", "dog"),f = c(function(x) x^2, function(x) sqrt(x)),f2 = c(function(x) (x[1]+x[2])^2,function(x) sqrt(x[1]+x[2])))

我可以轻松制作一个数据帧,以便任何 cat 行获得 cat 函数,而任何 dog 行获得 dog 函数:

df_both <- left_join(df_d, df_f)

我能够弄清楚如何将每个 f 函数应用于例如 value_1 列以获取:

df_both %>% mutate(result = invoke_map_dbl(f, value_1))#># 小块:4 x 6#>key value_1 value_2 f f2 结果#><chr><dbl><dbl><列表><列表><dbl>#>1 猫 1.00 2.00 <fn><fn>1.00#>2 猫 2.00 4.00 <fn><fn>4.00#>3 狗 3.00 6.00 <fn><fn>1.73#>4 狗 4.00 8.00 <fn><fn>2.00

我的问题是:我怎样才能创建一个列 result2,它接受 f2 中的每个函数并用作它的输入 c(value_1, value_2).如果将 f2 中的函数重新定义为两个变量的显式函数会使事情变得容易得多,那也很好.

所需的输出:

#># 小费:4 x 7#>key value_1 value_2 f f2 result result2#><chr><dbl><dbl><列表><列表><dbl><dbl>#>1 猫 1.00 2.00 <fn><fn>1.00 9.00#>2 猫 2.00 4.00 <fn><fn>4.00 36.0#>3 狗 3.00 6.00 <fn><fn>1.73 3.00#>4 狗 4.00 8.00 <fn><fn>2.00 3.46

(这个问题是由今天早些时候一个不幸的自我删除问题引发的.)

解决方案

如果将 f2 中的函数重新定义为两个变量的显式函数会使事情变得更容易,那也很好."

是的,我认为这是一种更自然的情况.否则数据将按行存储,并且可能需要重新整形.

重新定义你的功能:

df_f <- data_frame(key = c("cat", "dog"),f = c(function(x) x^2, function(x) sqrt(x)),f2 = c(function(x, y) (x + y)^2, function(x, y) sqrt(x + y)))df_both <- left_join(df_d, df_f)

现在您再次使用 map_invoke,将 .x 作为列表传递,尽管您需要使用 transpose 将列表翻转过来:

变异(df_both,结果 = invoke_map_dbl(f, value_1),result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2))))

<块引用>

# tibble: 4 x 7key value_1 value_2 f f2 result result2<chr><dbl><dbl><列表><列表><dbl><dbl>1 猫 1. 2. <fn><fn>1.00 9.002 猫 2. 4. <fn><fn>4.00 36.03 狗 3. 6. <fn><fn>1.73 3.004 狗 4. 8. <fn><fn>2.00 3.46

一组三个参数函数将简单地扩展到 invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))

请注意,这种方法不适用于大型数据集,因为您没有使用矢量化.

一个更具可扩展性的替代方案可能涉及嵌套,您至少在每个组中应用每个函数一次:

df_both %>%group_by(key) %>%嵌套()%>%变异(数据=地图(数据,~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2)))) %>%取消嵌套()

得到相同的结果.

This dataframe contains what I'll call the "data":

library(tidyverse)
df_d <- data_frame(key = c("cat", "cat", "dog", "dog"), 
               value_1 = c(1,2,3,4), 
               value_2 = c(2,4,6,8))

Here is a dataframe that I intend to use as something like a function look-up table. f is a single variable function and f2 is a multivariable function:

df_f <- data_frame(key = c("cat", "dog"),
               f = c(function(x) x^2, function(x) sqrt(x)),
               f2 = c(function(x) (x[1]+x[2])^2, function(x) sqrt(x[1]+x[2])))

I can easily make a dataframe so that any cat row gets the cat functions and any dog row gets the dog functions:

df_both <- left_join(df_d, df_f)

I was able to figure out how to apply each of the f functions to, say, the value_1 column to get:

df_both %>% mutate(result = invoke_map_dbl(f, value_1))        
#> # A tibble: 4 x 6
#>   key   value_1 value_2 f      f2     result
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00

My question is: how can I create a columns result2 that takes each function in f2 and uses as its input c(value_1, value_2). If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too.

Desired output:

#> # A tibble: 4 x 7
#>   key   value_1 value_2 f      f2     result result2
#>   <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
#> 1 cat      1.00    2.00 <fn>   <fn>     1.00    9.00
#> 2 cat      2.00    4.00 <fn>   <fn>     4.00   36.0 
#> 3 dog      3.00    6.00 <fn>   <fn>     1.73    3.00
#> 4 dog      4.00    8.00 <fn>   <fn>     2.00    3.46

(Question motivated by an unfortunately self-deleted question from earlier today.)

解决方案

"If re-defining the functions in f2 to be explicitly functions of two variables makes things much easier, that's fine too."

Yes, that would be a more natural situation here, I think. Otherwise data is stored rowwise, and should possibly be reshaped.

Redefining your functions:

df_f <- data_frame(key = c("cat", "dog"),
                   f = c(function(x) x^2, function(x) sqrt(x)),
                   f2 = c(function(x, y) (x + y)^2, function(x, y) sqrt(x + y)))
df_both <- left_join(df_d, df_f)

Now you again use map_invoke, passing .x as a list, although you need to turn the lists inside out using transpose:

mutate(
  df_both,
  result  = invoke_map_dbl(f, value_1),
  result2 = invoke_map_dbl(f2, transpose(list(value_1, value_2)))
)

# A tibble: 4 x 7
  key   value_1 value_2 f      f2     result result2
  <chr>   <dbl>   <dbl> <list> <list>  <dbl>   <dbl>
1 cat        1.      2. <fn>   <fn>     1.00    9.00
2 cat        2.      4. <fn>   <fn>     4.00   36.0 
3 dog        3.      6. <fn>   <fn>     1.73    3.00
4 dog        4.      8. <fn>   <fn>     2.00    3.46

A set of three argument functions would then simply extend to invoke_map_dbl(f3, transpose(list(value_1, value_2, value_3))

Note that this kind of approach will not work well on large datasets, since you aren't using vectorization.

A more scalable alternative may involve nesting, where you at least apply each function once within each group:

df_both %>% 
  group_by(key) %>% 
  nest() %>% 
  mutate(data = map(
    data, 
    ~mutate(., result = first(f)(value_1), result2 = first(f2)(value_1, value_2))
    )) %>% 
  unnest()

Which gives the same result.

这篇关于将具有多变量函数列表变量的数据框应用于具有函数参数的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆