为什么R dplyr :: mutate与自定义函数不一致 [英] Why is R dplyr::mutate inconsistent with custom functions

查看:80
本文介绍了为什么R dplyr :: mutate与自定义函数不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是为什么,而不是方法。在下面的代码中,我试图理解为什么 dplyr :: mutate 计算一个自定义函数( f())包含整个向量,但不包含其他自定义函数( g())。 mutate 到底在做什么?

This question is a "why", not a how. In the following code I'm trying to understand why dplyr::mutate evaluates one custom function (f()) with the entire vector but not with the other custom function (g()). What exactly is mutate doing?

set.seed(1);sum(rnorm(100, c(0, 10, 100)))
f=function(m) {
    set.seed(1)
    sum(rnorm(100, mean=m))
}
g <- function(m) sin(m)
df <- data.frame(a=c(0, 10, 100))
y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y2 <- rowwise(df) %>%
    mutate(asq=a^2, fout=f(a), gout=g(a))
y3 <- group_by(df, a) %>%
    summarize(asq=a^2, fout=f(a), gout=g(a))

对于所有三列, asq fout gout ,在<$ c $中按行进行评估c> y2 和 y3 ,结果相同。但是,所有三行的 y1 $ fout 均为3640.889,这是对 sum(rnorm(100,c(0,10,100 )))。因此,函数 f()正在评估每一行的整个向量。

For all three columns, asq, fout, and gout, evaluation is rowwise in y2 and y3 and the results are identical. However, y1$fout is 3640.889 for all three rows, which is the result of evaluating sum(rnorm(100, c(0, 10, 100))). So the function f() is evaluating the entire vector for each row.

提出了一个密切相关的问题其他地方在R dplyr中更改/转换(通过自定义函数) ,但未解释为什么。

A closely related question has been asked elsewhere mutate/transform in R dplyr (Pass custom function), but the "why" was not explained.

推荐答案

sin ^ 是向量化的,因此它们本机对每个单独的值进行操作,而不是对值的整个向量进行操作。 f 未向量化。但是您可以执行 f = Vectorize(f),它也会对每个单独的值进行运算。

sin and ^ are vectorized, so they natively operate on each individual value, rather than on the whole vector of values. f is not vectorized. But you can do f = Vectorize(f) and it will operate on each individual value as well.

y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1




    a   asq     fout       gout
1   0     0 3640.889  0.0000000
2  10   100 3640.889 -0.5440211
3 100 10000 3640.889 -0.5063656




f = Vectorize(f)

y1a <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1a




    a   asq        fout       gout
1   0     0    10.88874  0.0000000
2  10   100  1010.88874 -0.5440211
3 100 10000 10010.88874 -0.5063656


一些关于矢量化的附加信息此处此处,以及此处

Some additional info on vectorization here, here, and here.

这篇关于为什么R dplyr :: mutate与自定义函数不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆