在R中,将函数应用于数据框的行并返回数据框 [英] In R, apply a function to the rows of a data frame and return a data frame
问题描述
我正在尝试将自写函数应用于数据框的行。
I am trying to apply a self-written function to the rows of a data frame.
library(dplyr) # only used for data_frame
DF = data_frame(x = c(50, 49, 20), y = c(132, 124, 130), z = c(0.82, 1, 0.63))
x y z
<dbl> <dbl> <dbl>
1 50 132 0.82
2 49 124 1.00
3 20 130 0.63
实际的数据帧有成千上万的行,这只是一个示例。
The actual data frame has thousands of rows, this is just a sample.
我的函数非常复杂并且可以做很多事情,最后我得到了DF的每一行都有新的一行。
为简单起见,假设该函数在第1列添加1,在第2列添加2,在第3列添加3(当然可以向量化,但是我的函数叫Funct,功能更多)。
所以:
My function is very complicated and does many things, and in the end I get for each row of DF a new row. Let's say for simplicity that the function adds 1 to column 1, 2 to column 2 and 3 to column 3 (this of course can be vectorized, but my function, lets call it Funct, does much more). So:
Funct = function(DF) {
DF[1]= DF[1]+1
DF[2] = DF[2]+2
DF[3] = DF[3]+3
return(DF)
}
如何以最有效的方式应用此功能,最终获得带有输出的新数据框:
How do I apply this function in the most efficient way to get in the end a new data frame with the output:
> DF
x y z
<dbl> <dbl> <dbl>
1 51 134 3.82
2 50 126 4.00
3 21 132 3.63
推荐答案
apply
对于数据帧来说是个不好的选择,因为它是为矩阵设计的,因此会强制将数据帧输入到迭代之前的矩阵。除了偶尔是昂贵的转换(此后必须逆转)之外,真正的问题是R中的矩阵只能处理一种类型,而数据帧的每个变量可以具有不同的类型。因此,尽管这里的数据可以正常工作,但是当数字被强制转换为字符时(因为另一列是一个因素),您常常会遇到强制转换出现在看不见的矩阵中的情况。如果您确实要使用 apply
,请事先将其显式强制转换为矩阵,以便了解其工作原理,并避免出现许多烦人的bug。
apply
is a bad option for data frames because it is designed for matrices, and thus will coerce a data frame input to a matrix before iterating. Aside from occasionally being an expensive conversion (which has to be reversed afterwards), the real problem with this is that matrices in R can only handle a single type, whereas data frames can have a different type for each variable. Thus, while it will work fine for the data here, you'll often end up with type coercion happening in a matrix you can't see, when numbers are coerced to character because another column is a factor. If you really want to use apply
, explicitly coerce to a matrix beforehand so you can see what it is working with, and you'll avoid a lot of annoying bugs.
但是有一个比 apply
更好的选择:相反,对变量(列)进行并行迭代,然后强制结果列出返回数据框。 purrr :: pmap_dfr
将处理两个部分:
But there's a better option than apply
: instead, iterate in parallel over the variables (columns) and then coerce the resulting list back to a data frame. purrr::pmap_dfr
will handle both parts:
library(tidyverse)
DF = data_frame(x = c(50, 49, 20),
y = c(132, 124, 130),
z = c(0.82, 1, 0.63))
DF %>%
pmap_dfr(~list(x = ..1 + 1,
y = ..2 + 2,
z = ..3 + 3))
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 51. 134. 3.82
#> 2 50. 126. 4.00
#> 3 21. 132. 3.63
您可以使用
do.call(rbind, do.call(Map,
c(function(...){
data.frame(x = ..1 + 1,
y = ..2 + 2,
z = ..3 + 3)
},
DF)
))
#> x y z
#> 1 51 134 3.82
#> 2 50 126 4.00
#> 3 21 132 3.63
...虽然不是很漂亮。
...though it's not terribly pretty.
请注意,向量化的解决方案将尽可能快得多。
Note that a vectorized solution, when possible, will be much, much faster.
DF %>%
mutate(x = x + 1,
y = y + 2,
z = z + 3)
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 51. 134. 3.82
#> 2 50. 126. 4.00
#> 3 21. 132. 3.63
这篇关于在R中,将函数应用于数据框的行并返回数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!