在R中,将函数应用于数据框的行并返回数据框 [英] In R, apply a function to the rows of a data frame and return a data frame

查看:80
本文介绍了在R中,将函数应用于数据框的行并返回数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将自写函数应用于数据框的行。

I am trying to apply a self-written function to the rows of a data frame.

library(dplyr) # only used for data_frame
DF = data_frame(x = c(50, 49, 20), y = c(132, 124, 130), z = c(0.82, 1, 0.63))

     x     y     z
   <dbl> <dbl> <dbl>
1    50   132  0.82
2    49   124  1.00
3    20   130  0.63

实际的数据帧有成千上万的行,这只是一个示例。

The actual data frame has thousands of rows, this is just a sample.

我的函数非常复杂并且可以做很多事情,最后我得到了DF的每一行都有新的一行。
为简单起见,假设该函数在第1列添加1,在第2列添加2,在第3列添加3(当然可以向量化,但是我的函数叫Funct,功能更多)。
所以:

My function is very complicated and does many things, and in the end I get for each row of DF a new row. Let's say for simplicity that the function adds 1 to column 1, 2 to column 2 and 3 to column 3 (this of course can be vectorized, but my function, lets call it Funct, does much more). So:

Funct = function(DF) {
   DF[1]= DF[1]+1
   DF[2] = DF[2]+2
   DF[3] = DF[3]+3
   return(DF)
}

如何以最有效的方式应用此功能,最终获得带有输出的新数据框:

How do I apply this function in the most efficient way to get in the end a new data frame with the output:

> DF
     x     y     z
   <dbl> <dbl> <dbl>
1    51   134  3.82
2    50   126  4.00
3    21   132  3.63


推荐答案

apply 对于数据帧来说是个不好的选择,因为它是为矩阵设计的,因此会强制将数据帧输入到迭代之前的矩阵。除了偶尔是昂贵的转换(此后必须逆转)之外,真正的问题是R中的矩阵只能处理一种类型,而数据帧的每个变量可以具有不同的类型。因此,尽管这里的数据可以正常工作,但是当数字被强制转换为字符时(因为另一列是一个因素),您常常会遇到强制转换出现在看不见的矩阵中的情况。如果您确实要使用 apply ,请事先将其显式强制转换为矩阵,以便了解其工作原理,并避免出现许多烦人的bug。

apply is a bad option for data frames because it is designed for matrices, and thus will coerce a data frame input to a matrix before iterating. Aside from occasionally being an expensive conversion (which has to be reversed afterwards), the real problem with this is that matrices in R can only handle a single type, whereas data frames can have a different type for each variable. Thus, while it will work fine for the data here, you'll often end up with type coercion happening in a matrix you can't see, when numbers are coerced to character because another column is a factor. If you really want to use apply, explicitly coerce to a matrix beforehand so you can see what it is working with, and you'll avoid a lot of annoying bugs.

但是有一个比 apply 更好的选择:相反,对变量(列)进行并行迭代,然后强制结果列出返回数据框。 purrr :: pmap_dfr 将处理两个部分:

But there's a better option than apply: instead, iterate in parallel over the variables (columns) and then coerce the resulting list back to a data frame. purrr::pmap_dfr will handle both parts:

library(tidyverse)

DF = data_frame(x = c(50, 49, 20), 
                y = c(132, 124, 130), 
                z = c(0.82, 1, 0.63))

DF %>% 
    pmap_dfr(~list(x = ..1 + 1,
                   y = ..2 + 2,
                   z = ..3 + 3))
#> # A tibble: 3 x 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1   51.  134.  3.82
#> 2   50.  126.  4.00
#> 3   21.  132.  3.63

您可以使用

do.call(rbind, do.call(Map, 
                       c(function(...){
                           data.frame(x = ..1 + 1,
                                      y = ..2 + 2,
                                      z = ..3 + 3)
                       }, 
                       DF)
))
#>    x   y    z
#> 1 51 134 3.82
#> 2 50 126 4.00
#> 3 21 132 3.63

...虽然不是很漂亮。

...though it's not terribly pretty.

请注意,向量化的解决方案将尽可能快得多。

Note that a vectorized solution, when possible, will be much, much faster.

DF %>% 
    mutate(x = x + 1,
           y = y + 2,
           z = z + 3)
#> # A tibble: 3 x 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1   51.  134.  3.82
#> 2   50.  126.  4.00
#> 3   21.  132.  3.63

这篇关于在R中,将函数应用于数据框的行并返回数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆