R dplyr对仅由其字符串名称知道的列进行操作 [英] R dplyr operate on a column known only by its string name

查看：101 发布时间：2020/10/26 3:17:05 r dynamic dplyr quoting rlang

本文介绍了R dplyr对仅由其字符串名称知道的列进行操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在努力在R中使用 dplyr 进行编程，以对仅由其字符串名称知道的数据帧列进行操作。我知道最近对 dplyr 进行了更新，以支持quosures之类的东西，并且我在这里回顾了我认为是新的用dplyr编程文章的相关组件。： http://dplyr.tidyverse.org/articles/programming.html 。但是，我仍然无法做我想做的事。

I am wrestling with programming using dplyr in R to operate on columns of a data frame that are only known by their string names. I know there was recently an update to dplyr to support quosures and the like and I've reviewed what I think are the relevant components of the new "Programming with dplyr" article here: http://dplyr.tidyverse.org/articles/programming.html. However, I'm still not able to do what I want.

我的情况是我仅通过字符串名称知道数据框的列名称。因此，我不能在函数甚至脚本中调用 dplyr 的过程中使用非标准评估，因为我不能-对未加引号（即裸）的列名进行编码。我想知道如何解决这个问题，我想我正在用新的引用/取消引用语法忽略某些东西。

My situation is that I know a column name of a data frame only by its string name. Thus, I can't use non-standard evaluation in a call to dplyr within a function or even a script where the column name may change between runs because I can't hard-code the unquoted (i.e., "bare") column name generally. I'm wondering how to get around this, and I'm guessing I'm overlooking something with the new quoting/unquoting syntax.

例如，假设我有用户定义用于数据分布的截止百分位数的输入。用户可以使用他/她想要的任何百分比来运行代码，并且他/她选择的百分比将改变输出。在分析中，将在中间数据框中创建一列，并使用所使用的百分位名称。因此，此列的名称会根据用户输入的截止百分位数而变化。

For example, suppose I have user inputs that define cutoff percentiles for a distribution of data. A user may run the code using any percentile he/she would like, and the percentile he/she picks will change the output. Within the analysis, a column in an intermediate data frame is created with the name of the percentile that is used; thus this column's name changes depending on the cutoff percentile input by the user.

下面是一个最小的示例。我想使用截止百分位数的各种值来调用该函数。我希望名为 MPGCutoffs 的数据框具有根据所选的截止分位数命名的列（当前在以下代码中有效），我想稍后对它进行操作此列名称。由于此列名的通用性，在编写函数时，我只能通过输入 pctCutoff 来了解它，因此我需要一种对其进行操作的方法当只知道 probColName 定义的字符串时，该字符串遵循基于 pctCutoff 的值的预定义模式。

Below is a minimal example to illustrate. I want to call the function with various values for the cutoff percentile. I want the data frame named MPGCutoffs to have a column that is named according to the chosen cutoff quantile (this currently works in the below code), and I want to later operate on this column name. Because of the generality of this column name, I can only know it in terms of the input pctCutoff at the time of writing the function, so I need a way to operate on it when only knowing the string defined by probColName, which follows a predefined pattern based on the value of pctCutoff.

userInput_prob1 <- 0.95
userInput_prob2 <- 0.9

# Function to get cars that have the "best" MPG
# fuel economy, where "best" is defined by the
# percentile cutoff passed to the function.
getBestMPG <- function( pctCutoff ){

  # Define new column name to hold the MPG percentile cutoff.
  probColName <- paste0('P', pctCutoff*100)

  # Compute the MPG percentile cutoff by number of gears.
  MPGCutoffs <- mtcars %>%
    dplyr::group_by( gear ) %>%
    dplyr::summarize( !!probColName := quantile(mpg, pctCutoff) )

  # Filter mtcars with only MPG values above cutoffs.
  output <- mtcars %>%
    dplyr::left_join( MPGCutoffs, by='gear' ) %>%
    dplyr::filter( mpg > !!probColName ) #****This doesn't run; this is where I'm stuck

  # Return filtered data.
  return(output)
}

best_1 <- getBestMPG( userInput_prob1 )
best_2 <- getBestMPG( userInput_prob2 )

dplyr :: filter（）语句是我无法运行的正确地。我试过了：

The dplyr::filter() statement is what I can't get to run properly. I've tried:

dplyr :: filter（mpg> probColName）-没有错误，但是没有行

dplyr::filter( mpg > probColName ) - No error, but no rows returned.

dplyr :: filter（mpg> !! probColName）-没有错误，但没有返回行

dplyr::filter( mpg > !!probColName ) - No error, but no rows returned.

我还看到了一些示例，其中可以将类似 quo（P95）的内容传递给函数，然后在对 dplyr :: filter（）的调用中取消引用；我已经开始使用它了，但是它不能解决我的问题，因为它需要在函数外对变量名进行硬编码。例如，如果我这样做并且用户传递的百分位数为0.90，则对 dplyr :: filter（）的调用将失败，因为创建的列名为 P90 而不是 P95 。

I've also seen examples where I could pass something like quo(P95) to the function and then unquote it in the call to dplyr::filter(); I've gotten this to work, but it doesn't solve my problem since it requires hard-coding the variable name outside the function. For example, if I do this and the percentile passed by the user is 0.90, then the call to dplyr::filter() fails because the column created is named P90 and not P95.

任何帮助将不胜感激。我希望有一个简单的解决方案，我只是忽略了。

Any help would be greatly appreciated. I'm hoping there's an easy solution that I'm just overlooking.

R dplyr对仅由其字符串名称知道的列进行操作 [英] R dplyr operate on a column known only by its string name

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R dplyr对仅由其字符串名称知道的列进行操作 [英] R dplyr operate on a column known only by its string name

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭