R:逐行 dplyr::mutate 使用采用数据帧行并返回整数的函数 [英] R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

查看:13
本文介绍了R:逐行 dplyr::mutate 使用采用数据帧行并返回整数的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用自定义函数来使用管道 mutate 语句.我看起来有点类似 所以发布但徒劳无功.假设我有一个这样的数据框(其中 blob 是一些与特定任务无关的变量,但它是整个数据的一部分):

df <-data.frame(exclude=c('B','B','D'),B=c(1,0,0),C=c(3,4,9),D=c(1,1,0),blob=c('fd', 'fs', 'sa'),字符串AsFactors = F)

我有一个使用变量名称的函数,因此根据 exclude 列中的值选择一些,例如计算 exclude 中未指定的变量的总和(始终为单个字符).

FUN <- 函数(df){sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['排除']])}

当我给 FUN 单行(第 1 行)时,我得到了 CD(那些没有提到的)的预期总和by exclude),即 4:

FUN(df[1,])

如何在带有 mutate 的管道中执行类似操作(将结果添加到变量 s).这两次尝试都不起作用:

df %>% mutate(s=FUN(.))df %>% group_by(1:n()) %>% mutate(s=FUN(.))

更新这也不能按预期工作:

df %>% rowwise(.) %>% mutate(s=FUN(.))

这是有原因的,但不在 dplyr 的 mutate(和管道)范围内:

df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))

解决方案

如果你想使用 dplyr 你可以使用 rowwise 和你的函数 FUN.

df %>%逐行%>%做({结果 = as_data_frame(.)结果$s = FUN(结果)结果})

使用 group_by 而不是 rowwise(就像您已经尝试过的那样)但使用 do 而不是 mutate<也可以实现相同的效果/代码>

df %>%group_by(1:n())%>%做({结果 = as_data_frame(.)结果$s = FUN(结果)结果})

在这种情况下 mutate 不起作用的原因是你将整个 tibble 传递给它,所以它就像调用 FUN(df).>

做同样事情的一种更有效的方法是制作一个要包含的列矩阵,然后使用 rowSums.

cols <- c('B', 'C', 'D')include_mat <- 外层(函数(x,y)x != y,X = df$exclude,Y = cols)# or outer(`!=`, X = df$exclude, Y = cols) 如果它对你更易读df$s <- rowSums(df[cols] * include_mat)

I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where blob is some variable not related to the specific task but is part of the entire data) :

df <- 
  data.frame(exclude=c('B','B','D'), 
             B=c(1,0,0), 
             C=c(3,4,9), 
             D=c(1,1,0), 
             blob=c('fd', 'fs', 'sa'), 
             stringsAsFactors = F)

I have a function that uses the variable names so select some based on the value in the exclude column and e.g. calculates a sum on the variables not specified in exclude (which is always a single character).

FUN <- function(df){
  sum(df[c('B', 'C', 'D')] [!names(df[c('B', 'C', 'D')]) %in% df['exclude']] )
}

When I gives a single row (row 1) to FUN I get the the expected sum of C and D (those not mentioned by exclude), namely 4:

FUN(df[1,])

How do I do similarly in a pipe with mutate (adding the result to a variable s). These two tries do not work:

df %>% mutate(s=FUN(.))
df %>% group_by(1:n()) %>% mutate(s=FUN(.))

UPDATE This also do not work as intended:

df %>% rowwise(.) %>% mutate(s=FUN(.))

This works of cause but is not within dplyr's mutate (and pipes):

df$s <- sapply(1:nrow(df), function(x) FUN(df[x,]))

解决方案

If you want to use dplyr you can do so using rowwise and your function FUN.

df %>% 
    rowwise %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The same can be achieved using group_by instead of rowwise (like you already tried) but with do instead of mutate

df %>% 
    group_by(1:n()) %>% 
    do({
        result = as_data_frame(.)
        result$s = FUN(result)
        result
    })

The reason mutate doesn't work in this case, is that you are passing the whole tibble to it, so it's like calling FUN(df).

A much more efficient way of doing the same thing though is to just make a matrix of columns to be included and then use rowSums.

cols <- c('B', 'C', 'D')
include_mat <- outer(function(x, y) x != y, X = df$exclude, Y = cols)
# or outer(`!=`, X = df$exclude, Y = cols) if it's more readable to you
df$s <- rowSums(df[cols] * include_mat)

这篇关于R:逐行 dplyr::mutate 使用采用数据帧行并返回整数的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆