R:在函数中使用dplyr。 eval(expr,envir,enclosure)中的异常:未知列 [英] R: Using dplyr inside a function. exception in eval(expr, envir, enclos): unknown column

查看:125
本文介绍了R:在函数中使用dplyr。 eval(expr,envir,enclosure)中的异常:未知列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在R中创建了一个基于 kind kind @Jim M。

I have created a function in R based on the kind help of @Jim M.

当我运行该功能时,我收到错误:错误:未知列'rawdata'
当查看调试器时,我会收到以下消息: eval(expr,envir,enclosure)中的Rcpp ::异常:未知列'rawdata'

When i run the function i get the error: Error: unknown column 'rawdata' When looking at the debugger i get the message: Rcpp::exception in eval(expr, envir, enclos): unknown column 'rawdata'

然而,当我看环境窗口,我可以看到我已经传递给函数的2个变量,它们包含有7个级别因子的信息rawdata和28个级别的refdata

However when i look at the environment window i can see 2 variables which I have passed to the function and they contain information rawdata with 7 level factors and refdata with 28 levels

function (refdata, rawdata)
{
  wordlist <- expand.grid(rawdata = rawdata, refdata = refdata,     stringsAsFactors = FALSE)
  wordlist %>% group_by(rawdata) %>% mutate(match_score =     jarowinkler(rawdata, refdata)) %>%
summarise(match = match_score[which.max(match_score)], matched_to = ref[which.max(match_score)])
}


推荐答案

这是使用NSE(非标准评估)功能的问题。使用NSE的功能在交互式编程中非常有用,但是在开发中引起许多问题,即当您尝试使用其他功能时。由于表达式不被直接评估,R无法在其外观环境中找到对象。我可以建议您阅读这里,最好是范围界定问题章节了解更多信息。

This is the problem with functions using NSE (non-standard evaluation). Functions using NSE are very useful in interactive programming but cause many problems in development i.e. when you try to use those inside other functions. Due to expressions not being evaluated directly, R is not able to find the objects in the environments it looks in. I can suggest you read here and preferably the scoping issues chapter for more info.

首先你需要知道所有的标准 dplyr 函数使用NSE。我们来看看你的问题的一个大致例子:

First of all you need to know that ALL the standard dplyr functions use NSE. Let's see an approximate example to your problem:

数据:

df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))


> df
   col1       col2
1     a 0.03366446
2     a 0.46698763
3     a 0.34114682
4     a 0.92125387
5     a 0.94511394
6     b 0.67241460
7     b 0.38168131
8     b 0.91107090
9     b 0.15342089
10    b 0.60751868

我们来看看NSE如何使我们的简单问题变得迷恋:

Let's see how NSE will make our simple problem crush:

首先简单的交互式案例工作:

First of all the simple interactive case works:

df %>% group_by(col1) %>% summarise(count = n())

Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

让我们看看如果我把它放在一个函数中会发生什么:

Let's see what happens if I put it in a function:

lets_group <- function(column) {
  df %>% group_by(column) %>% summarise(count = n())
}

>lets_group(col1)
Error: index out of bounds 

与您不同的错误,但它是由NSE引起的。完全相同的代码行在函数之外工作。

Not the same error as yours but it is caused by NSE. Exactly the same line of code worked outside the function.

幸运的是,有一个解决您的问题的方法,这是标准评估。 Hadley还使用使用标准评估的 dplyr 中的所有功能的版本。他们只是正常的功能,加上 _ 下划线。

Fortunately, there is a solution to your problem and that is standard evaluation. Hadley also made versions of all the functions in dplyr that use standard evaluation. They are just the normal functions plus the _ underscore at the end.

现在看看这将如何工作:

Now look at how this will work:

#notice the formula operator (~) at the function at summarise_
lets_group2 <- function(column) {
  df %>% group_by_(column) %>% summarise_(count = ~n())
}

这产生以下结果:

#also notice the quotes around col1
> lets_group2('col1')
Source: local data frame [2 x 2]

  col1 count
1    a     5
2    b     5

我无法测试您的问题,但使用SE而不是NSE会给您所需的结果。有关更多信息,您还可以阅读此处

I cannot test your problem but using SE instead of NSE will give you the results you want. For more info you can also read here

这篇关于R:在函数中使用dplyr。 eval(expr,envir,enclosure)中的异常:未知列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆