R如何检查从某个包中的特定函数中是否调用了自定义函数 [英] R How to check that a custom function is called within a specific function from a certain package

查看:359
本文介绍了R如何检查从某个包中的特定函数中是否调用了自定义函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个只能在另一个函数内使用的函数 myfun ,在我的情况下为 dplyr s 变异摘要。我进一步不想依赖 dplyr 的内部结构(例如 mask $ ... )。


我想出了一个快速而肮脏的解决方法:函数 search_calling_fn 会检查调用堆栈中的所有函数名称,并在其中查找特定的模式

  search_calling_fn<-函数(模式){

call_st<-lapply(sys.calls (),`[[`,1)

res<-任意(unlist(lapply(call_st,function(x)grepl(pattern,x,perl = TRUE))))

if(!res){
stop( myfun()`必须仅在dpl​​yr :: mutate或dplyr :: summarise内部使用))
}否则{
return()
}
}

这可以按预期工作,如以下两个示例所示( dplyr = 1.0.0)

  library(dplyr)

myfun<-function(){
search_calling_fn( ^ mutate | ^ summarise)
NULL
}

#正常抛出
mtcars%>%
mutate(myfun())


myfun2<-function(){
search_calling_fn( ^ select )
NULL
}

#抛出预期的错误
mtcars%>%
mutate(myfun2())

这种方法有一个漏洞: myfun 可以从名称相似的函数中调用不是 dplyr 函数。我不知道如何检查调用堆栈中的函数来自哪个名称空间。 rlang 具有函数 call_ns ,但这仅在使用显式调用该函数时才有效package :: ... 。此外,使用 mutate 时,有一个内部函数 mutate_cols mutate.data.frame 调用堆栈上的S3方法-两者似乎都使命名空间更加复杂。


再三考虑,我想知道是否存在更好,更正式的名称实现相同结果的方法:仅允许在 dplyr s mutate <之内调用 myfun / code>或总结


无论函数如何调用,该方法都应起作用:


  1. mutate

  2. dplyr :: mutate

附加说明


在讨论@ r2evans答案后,我意识到解决方案应该通过以下测试:

  library(dplyr)

myfun<-function() {
search_calling_fn( ^ mutate | ^ summarise)
NULL
}

#屏蔽dplyr的变量
mu的示例tate<-function(df,x){
NULL
}

#应该抛出错误,但不会
mtcars%>%
mutate(myfun())

因此,检查功能不仅应查看调用堆栈,还应尝试查看哪个打包调用堆栈上的函数来自。有趣的是,RStudios调试器显示了调用堆栈上每个函数的名称空间,甚至是内部函数。我不知道该怎么做,因为 environment(fun))仅适用于导出的函数。

解决方案

更新:我要借阅来自 rlang :: trace_back ,因为它似乎有一种优雅(有效)的方法来确定完整的 package :: function 对于大多数调用树(例如%>%不一定总是完全解析)。


(如果您正在尝试减少包裹膨胀...虽然不太可能会出现 dplyr 而不是 purrr 可用,如果您希望尽可能多地执行基本操作,我已经提供了#==#个等效的base-R调用。尝试删除一些 rlang 调用,但是再次...如果您假设 dplyr ,那么您肯定周围有 rlang ,在这种情况下应该没有问题。)

  search_calling_pkg<-函数(pkgs,funcs){
#< borrowed from =" rlang :: trace_back"
框架<-sys.frames()
idx<-rlang ::: trace_find_bottom(NULL,框架)
框架< -frames [idx]
父母< -sys.parents()[idx]
调用<-as.list(sys.calls()[idx])
调用<-purrr :: map(calls,rlang ::: call_fix_car )
#==#调用<-lapply(调用,rlang ::: call_fix_car)
调用<-rlang :::: add_pipe_pointer(调用,帧)
调用<-purrr :: map2(通话,seq_along(通话),rlang ::: maybe_add_namespace)
#==#通话<-Map(rlang ::: maybe_add_namespace,通话,seq_along(calls))
#< ; / borrowed>
calls_chr<-vapply(calls,function(cl)as.character(cl)[1],character(1))
ptn<-paste0( ^(,paste( pkgs,折叠= |),)::)
pkgres<-any(grepl(ptn,calls_chr))
funcres<-!missing(funcs)& ;& any(mapply(grepl,paste0( ^,funcs, $),list(calls_chr)))
if(!pkgres ||!funcres){
stop( ;不正确的)
}否则return()
}

目的是您可以寻找特定的程序包和/或特定的功能。 funcs = 参数可以是固定的字符串(逐字记录),但是由于我认为您可能想与任何 mutate * 匹配。 code>函数(等),也可以使其成为正则表达式。所有功能都必须是完整的 package :: funcname ,而不仅仅是 funcname (尽管您当然可以使其成为正则表达式: -)。

  myfun1<-function(){
search_calling_pkg(pkgs = dplyr; )
NULL
}
myfun2<-function(){
search_calling_pkg(funcs = c( dplyr :: mutate。*, dplyr :: summarize 。*))
NULL
}
mutate<-function(df,x){force(x);空值; }


  mtcars [1:2,]% >%mutate(myfun1())
#错误:不正确

mtcars [1:2,]%>%dplyr :: mutate(myfun1())
#mpg cyl disp hp drat wt qsec vs am gear碳水化合物
#1 21 6 160110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160110 3.9 2.875 17.02 0 1 4 4

mtcars [1:2,]%>%mutate(myfun2())
#错误:不正确

mtcars [1:2,]%>%dplyr :: mutate(myfun2())
#mpg cyl disp hp drat wt qsec vs am gear碳水化合物
#1 21 6 160110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4

性能似乎比第一个答案要好得多,尽管仍然不是 ;零点击性能:

  microbenchmark :: microbenchmark(
a = mtcars%>%
dplyr :: mutate(),
b = mtcars%>%
dplyr :: mutate(myfun1())

#单位:毫秒
#expr min lq平均中位数uq最大净值
#a 1.5965 1.7444 1.883837 1.82955 1.91655 3.0574 100
#b 3.4748 3.7335 4.187005 3.92580 4.18140 19.4343 100




(这部分是为了繁荣,尽管请注意, getAnywhere 将找到 dplyr :: mutate ,即使上面的非dplyr mutate 已定义并调用。)


<在Rui的链接的支持下,我建议寻找特定功能很可能会错过新功能和/或其他有效但名称不同的功能。 (我没有一个明确的示例。)从这里开始,考虑寻找特定的软件包而不是特定的函数。

  search_calling_pkg<-函数(pkgs){
call_st<-lapply(sys.calls(),`[[`,1)
res <--any(vapply(call_st,function(ca )any(pkgs%in%tryCatch(getAnywhere(as.character(ca)[1])$ ​​where,e​​rror = function(e)″))),logical(1)))
if(! res){
stop( not from packages)
} else return()
}
myfun<-function(){
search_calling_pkg( ; package:dplyr)
NULL
}

意识到这不是一个便宜的操作。我相信大部分时间都在处理调用树,也许不是我们可以轻易补救的事情。

  microbenchmark :: microbenchmark(
a = mtcars%>%mutate(),
b = mtcars%>%mutate(myfun())

#单位:毫秒
#expr min lq平均中位数uq max neval
#a 1.872101 2.165801 2.531046 2.312051 2.72835 4.861202 100
#b 546.916301 571.909551 603.528225 589.995251 612.20240 798.707300 100

如果您认为它不会被频繁调用,并且您的函数需要花费一点时间,那么半秒的延迟可能不会那么明显,但是有了这个玩具例如,差异是明显的。


I want to create a function myfun that can only be used inside another function, in my case dplyrs mutate or summarise. I further do not want to rely on dplyrs internals (for example mask$...).

I came up with a quick and dirty workaround: A function search_calling_fn that checks all function names in the call stack and looks for a specific pattern in the calling functions.

search_calling_fn <- function(pattern) {
  
  call_st <- lapply(sys.calls(), `[[`, 1)
  
  res <- any(unlist(lapply(call_st, function(x) grepl(pattern, x, perl = TRUE))))
  
  if (!res) {
    stop("`myfun()` must only be used inside dplyr::mutate or dplyr::summarise")
  } else {
    return()
  }
}

This works as expected as the two examples below show (dplyr = 1.0.0)

library(dplyr)

myfun <- function() {
  search_calling_fn("^mutate|^summarise")
  NULL
}

# throws as expected no error
mtcars %>% 
  mutate(myfun())


myfun2 <- function() {
  search_calling_fn("^select")
  NULL
}

# throws as expected an error
mtcars %>% 
  mutate(myfun2())

This approach has one loophole: myfun could be called from a function with a similar name which is not a dplyr function. I wonder how I can check from which namespace a function on my call stack is coming. rlang has a function call_ns but this will only work, if the function is explicitly called with package::.... Further, when using mutate there is mutate_cols an internal function and mutate.data.frame an S3 method on the call stack - both seem to make getting the namespace even more complicated.

On a second thought I wonder whether there is a better, more official approach to achieve the same outcome: only allow myfun to be called within dplyrs mutate or summarise.

The approach should work no matter how the function is called:

  1. mutate
  2. dplyr::mutate

Additional note

After discussing @r2evans answer, I realize that a solution should pass the following test:

library(dplyr)

myfun <- function() {
  search_calling_fn("^mutate|^summarise")
  NULL
}

# an example for a function masking dplyr's mutate
mutate <- function(df, x) {
  NULL
}

# should throw an error but doesn't
mtcars %>% 
  mutate(myfun())

So the checking function should not only look at the callstack, but also try to see which package a function on the callstack is coming from. Interestingly, RStudios debugger shows the namespace for each function on the callstack, even for internal functions. I wonder how it does this, since environment(fun)) is only working on exported functions.

解决方案

Update: I'm going to "borrow" from rlang::trace_back, since it seems to have an elegant (and working) method for determining a full package::function for most of the call tree (some like %>% are not always fully-resolved).

(If you're trying to reduce package bloat ... while it's unlikely you'd have dplyr and not purrr available, if you would prefer to do as much in base as possible, I've provided #==# equivalent base-R calls. It's certainly feasible to try to remove some of the rlang calls, but again ... if you're assuming dplyr, then you definitely have rlang around, in which case this should not be a problem.)

search_calling_pkg <- function(pkgs, funcs) {
  # <borrowed from="rlang::trace_back">
  frames <- sys.frames()
  idx <- rlang:::trace_find_bottom(NULL, frames)
  frames <- frames[idx]
  parents <- sys.parents()[idx]
  calls <- as.list(sys.calls()[idx])
  calls <- purrr::map(calls, rlang:::call_fix_car)
  #==# calls <- lapply(calls, rlang:::call_fix_car)
  calls <- rlang:::add_pipe_pointer(calls, frames)
  calls <- purrr::map2(calls, seq_along(calls), rlang:::maybe_add_namespace)
  #==# calls <- Map(rlang:::maybe_add_namespace, calls, seq_along(calls))
  # </borrowed>
  calls_chr <- vapply(calls, function(cl) as.character(cl)[1], character(1))
  ptn <- paste0("^(", paste(pkgs, collapse = "|"), ")::")
  pkgres <- any(grepl(ptn, calls_chr))
  funcres <- !missing(funcs) && any(mapply(grepl, paste0("^", funcs, "$"), list(calls_chr)))
  if (!pkgres || !funcres) {
    stop("not correct")
  } else return()
}

The intention is that you can look for particular packages and/or particular functions. The funcs= argument can be fixed strings (taken as verbatim), but since I thought you might want to match against any of the mutate* functions (etc), you can also make it a regex. All functions need to be full package::funcname, not just funcname (though you could certainly make it a regex :-).

myfun1 <- function() {
  search_calling_pkg(pkgs = "dplyr")
  NULL
}
myfun2 <- function() {
  search_calling_pkg(funcs = c("dplyr::mutate.*", "dplyr::summarize.*"))
  NULL
}
mutate <- function(df, x) { force(x); NULL; }

mtcars[1:2,] %>% mutate(myfun1())
# Error: not correct

mtcars[1:2,] %>% dplyr::mutate(myfun1())
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1  21   6  160 110  3.9 2.620 16.46  0  1    4    4
# 2  21   6  160 110  3.9 2.875 17.02  0  1    4    4

mtcars[1:2,] %>% mutate(myfun2())
# Error: not correct

mtcars[1:2,] %>% dplyr::mutate(myfun2())
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1  21   6  160 110  3.9 2.620 16.46  0  1    4    4
# 2  21   6  160 110  3.9 2.875 17.02  0  1    4    4

And performance seems to be significantly better than the first answer, though still not a "zero hit" on performance:

microbenchmark::microbenchmark(
  a = mtcars %>%
  dplyr::mutate(),
  b = mtcars %>%
  dplyr::mutate(myfun1())
)
# Unit: milliseconds
#  expr    min     lq     mean  median      uq     max neval
#     a 1.5965 1.7444 1.883837 1.82955 1.91655  3.0574   100
#     b 3.4748 3.7335 4.187005 3.92580 4.18140 19.4343   100


(This portion kept for prosperity, though note that getAnywhere will find dplyr::mutate even if the above non-dplyr mutate is defined and called.)

Seeded by Rui's links, I suggest that looking for specific functions might very well miss new functions and/or otherwise-valid but differently-named functions. (I don't have a clear example.) From here, consider looking for particular packages instead of particular functions.

search_calling_pkg <- function(pkgs) {
  call_st <- lapply(sys.calls(), `[[`, 1)
  res <- any(vapply(call_st, function(ca) any(pkgs %in% tryCatch(getAnywhere(as.character(ca)[1])$where, error=function(e) "")), logical(1)))
  if (!res) {
    stop("not called from packages")
  } else return()
}
myfun <- function() {
  search_calling_pkg("package:dplyr")
  NULL
}

Realize that this is not an inexpensive operation. I believe the majority of time spent in this is dealing with the calling tree, perhaps not something we can easily remedy.

microbenchmark::microbenchmark(
  a = mtcars %>% mutate(),
  b = mtcars %>% mutate(myfun())
)
# Unit: milliseconds
#  expr        min         lq       mean     median        uq        max neval
#     a   1.872101   2.165801   2.531046   2.312051   2.72835   4.861202   100
#     b 546.916301 571.909551 603.528225 589.995251 612.20240 798.707300   100

If you believe it will be called infrequently and your function takes "a little time", then perhaps the half-second delay won't be that noticeable, but with this toy example the difference is palpable.

这篇关于R如何检查从某个包中的特定函数中是否调用了自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆