为什么 R dplyr::mutate 与自定义函数不一致 [英] Why is R dplyr::mutate inconsistent with custom functions

查看:21
本文介绍了为什么 R dplyr::mutate 与自定义函数不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是一个为什么",而不是一个如何.在下面的代码中,我试图理解为什么 dplyr::mutate 用整个向量计算一个自定义函数 (f()) 而不是另一个自定义函数(g()).mutate 到底在做什么?

set.seed(1);sum(rnorm(100, c(0, 10, 100)))f=函数(m) {set.seed(1)总和(范数(100,平均值=米))}g <- 函数(m) sin(m)df <- data.frame(a=c(0, 10, 100))y1 <-变异(df,asq=a^2,fout=f(a),痛风=g(a))y2 <- rowwise(df) %>%变异(asq=a^2,fout=f(a),痛风=g(a))y3 <- group_by(df, a)%>%总结(asq=a^2,fout=f(a),痛风=g(a))

对于所有三列,asqfoutgout,在 y2y2 中按行进行评估code>y3 和结果是一样的.但是,对于所有三行,y1$fout 都是 3640.889,这是计算 sum(rnorm(100, c(0, 10, 100))) 的结果.所以函数 f() 正在评估每一行的整个向量.

其他地方已经问过一个密切相关的问题 R 中的变异/转换dplyr(传递自定义函数),但没有解释为什么".

解决方案

sin^ 是矢量化的,因此它们本机对每个单独的值进行操作,而不是对值的整个向量.f 未矢量化.但是您可以执行 f = Vectorize(f) 并且它也会对每个单独的值进行操作.

y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))y1

<块引用>

 a asq fout 痛风1 0 0 3640.889 0.00000002 10 100 3640.889 -0.54402113 100 10000 3640.889 -0.5063656

f = Vectorize(f)y1a <- 变异(df,asq=a^2,fout=f(a),痛风=g(a))y1a

<块引用>

 a asq fout 痛风1 0 0 10.88874 0.00000002 10 100 1010.88874 -0.54402113 100 10000 10010.88874 -0.5063656

有关矢量化的一些其他信息此处此处此处.

This question is a "why", not a how. In the following code I'm trying to understand why dplyr::mutate evaluates one custom function (f()) with the entire vector but not with the other custom function (g()). What exactly is mutate doing?

set.seed(1);sum(rnorm(100, c(0, 10, 100)))
f=function(m) {
    set.seed(1)
    sum(rnorm(100, mean=m))
}
g <- function(m) sin(m)
df <- data.frame(a=c(0, 10, 100))
y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y2 <- rowwise(df) %>%
    mutate(asq=a^2, fout=f(a), gout=g(a))
y3 <- group_by(df, a) %>%
    summarize(asq=a^2, fout=f(a), gout=g(a))

For all three columns, asq, fout, and gout, evaluation is rowwise in y2 and y3 and the results are identical. However, y1$fout is 3640.889 for all three rows, which is the result of evaluating sum(rnorm(100, c(0, 10, 100))). So the function f() is evaluating the entire vector for each row.

A closely related question has been asked elsewhere mutate/transform in R dplyr (Pass custom function), but the "why" was not explained.

解决方案

sin and ^ are vectorized, so they natively operate on each individual value, rather than on the whole vector of values. f is not vectorized. But you can do f = Vectorize(f) and it will operate on each individual value as well.

y1 <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1

    a   asq     fout       gout
1   0     0 3640.889  0.0000000
2  10   100 3640.889 -0.5440211
3 100 10000 3640.889 -0.5063656

f = Vectorize(f)

y1a <- mutate(df, asq=a^2, fout=f(a), gout=g(a))
y1a

    a   asq        fout       gout
1   0     0    10.88874  0.0000000
2  10   100  1010.88874 -0.5440211
3 100 10000 10010.88874 -0.5063656

Some additional info on vectorization here, here, and here.

这篇关于为什么 R dplyr::mutate 与自定义函数不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆