dynamic dplyr列名称计算 [英] dynamic dplyr column name calculation

查看:167
本文介绍了dynamic dplyr列名称计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码。

colName被传入。我一直试图让它被评估为colName的值,但没有太多成功。我试过eval,setNames等使用_,仍然没有提供成功。

colName is passed in. I've been trying to get it to be evaluated as the value of colName but have not had much success. I've tried "eval", "setNames", etc. Using the "_", still has not provided success.

本质上,如果我的colName =MyCol ,我想要dplyr链执行,就像最后一行读取的那样:

Essentially, if my colName = "MyCol", I want the dplyr chain to execute as if the last line read:

mutate(MyCol = ifelse(is.na(MyCol) BLANK,MyCol))

makeSummaryTable <- function(colName,originalData){
  result <- originalData %>% 
    group_by_(colName) %>% 
    summarise(numObs = n()) %>% 
    ungroup() %>% 
    arrange(desc(numObs)) %>% 
    rowwise() %>% 
    mutate_(colName = ifelse(is.na(colName), "BLANK",colName))
  return(result)
}


推荐答案

p>使用dplyr 0.6.0,使用新的 tidyeval 方法进行非标准评估。 (我不知道是否甚至可以采用标准评估,至少以直截了当的方式):


Here's how to do it with dplyr 0.6.0 using the new tidyeval approach to non-standard evaluation. (I'm not sure if it's even possible to do with standard evaluation, at least in a straightforward manner):

library(dplyr)

makeSummaryTable <- function(colName, originalData){

  colName <- enquo(colName)

  originalData %>% 
    count(!!colName) %>% 
    arrange(desc(n)) %>%
    mutate(
      old_col = !!colName,
      !!quo_name(colName) := if_else(is.na(!!colName), "BLANK",!!colName)
      )
}

makeSummaryTable(hair_color, starwars)
#> # A tibble: 13 x 3
#>       hair_color     n       old_col
#>            <chr> <int>         <chr>
#>  1          none    37          none
#>  2         brown    18         brown
#>  3         black    13         black
#>  4         BLANK     5          <NA>
#>  5         white     4         white
#>  6         blond     3         blond
#>  7        auburn     1        auburn
#>  8  auburn, grey     1  auburn, grey
#>  9 auburn, white     1 auburn, white
#> 10        blonde     1        blonde
#> 11   brown, grey     1   brown, grey
#> 12          grey     1          grey
#> 13       unknown     1       unknown

enquo 将无引号列名称被称为quosure的一些花哨的对象。 !! 然后取消引用quosure,以便它可以得到评估,就像直接在函数中键入一样。有关更深入和准确的解释,请参阅Hadley的使用dplyr编程

enquo turns the unquoted column name into some fancy object called a quosure. !! then unquotes the quosure so that it can get evaluated as if it would be typed directly in the function. For a more in-depth and accurate explanation, see Hadley's "Programming with dplyr".

编辑:我意识到原来的问题是将用户名为 colName的新列命名而不仅仅是 colName 所以我更新了我的答案。要实现这一点,需要使用 quo_name 转换为字符串(或标签)。然后,使用 !! 可以是不引用,就像常规的一样。唯一需要注意的是,由于R不能使表达式的头部或尾部 mutate(!! foo = bar) tidyeval 引入新的定义运算符:= (这可能是来自 data.table 的用户所熟悉的使用有点不同)。与传统赋值运算符 = 不同,:= 运算符允许在右侧和左侧进行排序

EDIT: I realized that the original question was to name the new column with the user-supplied value of colName and not just colName so I updated my answer. To accomplish that, the quosure needs to be turned into a string (or label) using quo_name. Then, it can be "unquoted" using !! just as a regular quosure would be. The only caveat is that since R can't make head or tails of the expression mutate(!!foo = bar), tidyeval introduces the new definition operator := (which might be familiar to users from data.table where it has a somewhat different use). Unlike the traditional assignment operator =, the := operator allows unquoting on both the right-hand and left-hand side.

(更新了使用其行中 NA 的数据框的答案,以说明最后一个 mutate 工作,我还使用 count 而不是 group by + 总结,我放弃了不必要的 rowwise 。)

(updated the answer to use a dataframe that has NA in one of its rows, to illustrate that the last mutate works. I also used count instead of group by + summarize, and I dropped the unnecessary rowwise.)

这篇关于dynamic dplyr列名称计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆