R如何检查从某个包中的特定函数中是否调用了自定义函数 [英] R How to check that a custom function is called within a specific function from a certain package
问题描述
我想创建一个只能在另一个函数内使用的函数 myfun
,在我的情况下为 dplyr
s 变异
或摘要
。我进一步不想依赖 dplyr
的内部结构(例如 mask $ ...
)。
我想出了一个快速而肮脏的解决方法:函数 search_calling_fn
会检查调用堆栈中的所有函数名称,并在其中查找特定的模式
search_calling_fn<-函数(模式){
call_st<-lapply(sys.calls (),`[[`,1)
res<-任意(unlist(lapply(call_st,function(x)grepl(pattern,x,perl = TRUE))))
if(!res){
stop( myfun()`必须仅在dplyr :: mutate或dplyr :: summarise内部使用))
}否则{
return()
}
}
这可以按预期工作,如以下两个示例所示( dplyr
= 1.0.0)
library(dplyr)
myfun<-function(){
search_calling_fn( ^ mutate | ^ summarise)
NULL
}
#正常抛出
mtcars%>%
mutate(myfun())
myfun2<-function(){
search_calling_fn( ^ select )
NULL
}
#抛出预期的错误
mtcars%>%
mutate(myfun2())
这种方法有一个漏洞: myfun
可以从名称相似的函数中调用不是 dplyr
函数。我不知道如何检查调用堆栈中的函数来自哪个名称空间。 rlang
具有函数 call_ns
,但这仅在使用显式调用该函数时才有效package :: ...
。此外,使用 mutate
时,有一个内部函数 mutate_cols
和 mutate.data.frame
调用堆栈上的S3方法-两者似乎都使命名空间更加复杂。
再三考虑,我想知道是否存在更好,更正式的名称实现相同结果的方法:仅允许在 dplyr
s mutate <之内调用
myfun
/ code>或总结
。
无论函数如何调用,该方法都应起作用:
-
mutate
-
dplyr :: mutate
附加说明
在讨论@ r2evans答案后,我意识到解决方案应该通过以下测试:
library(dplyr)
myfun<-function() {
search_calling_fn( ^ mutate | ^ summarise)
NULL
}
#屏蔽dplyr的变量
mu的示例tate<-function(df,x){
NULL
}
#应该抛出错误,但不会
mtcars%>%
mutate(myfun())
因此,检查功能不仅应查看调用堆栈,还应尝试查看哪个打包调用堆栈上的函数来自。有趣的是,RStudios调试器显示了调用堆栈上每个函数的名称空间,甚至是内部函数。我不知道该怎么做,因为 environment(fun))
仅适用于导出的函数。
更新:我要借阅来自 rlang :: trace_back
,因为它似乎有一种优雅(有效)的方法来确定完整的 package :: function
对于大多数调用树(例如%>%
不一定总是完全解析)。
(如果您正在尝试减少包裹膨胀...虽然不太可能会出现 dplyr
而不是 purrr
可用,如果您希望尽可能多地执行基本操作,我已经提供了#==#
个等效的base-R调用。尝试删除一些 rlang
调用,但是再次...如果您假设 dplyr
,那么您肯定周围有 rlang
,在这种情况下应该没有问题。)
search_calling_pkg<-函数(pkgs,funcs){
#< borrowed from =" rlang :: trace_back"
框架<-sys.frames()
idx<-rlang ::: trace_find_bottom(NULL,框架)
框架< -frames [idx]
父母< -sys.parents()[idx]
调用<-as.list(sys.calls()[idx])
调用<-purrr :: map(calls,rlang ::: call_fix_car )
#==#调用<-lapply(调用,rlang ::: call_fix_car)
调用<-rlang :::: add_pipe_pointer(调用,帧)
调用<-purrr :: map2(通话,seq_along(通话),rlang ::: maybe_add_namespace)
#==#通话<-Map(rlang ::: maybe_add_namespace,通话,seq_along(calls))
#< ; / borrowed>
calls_chr<-vapply(calls,function(cl)as.character(cl)[1],character(1))
ptn<-paste0( ^(,paste( pkgs,折叠= |),)::)
pkgres<-any(grepl(ptn,calls_chr))
funcres<-!missing(funcs)& ;& any(mapply(grepl,paste0( ^,funcs, $),list(calls_chr)))
if(!pkgres ||!funcres){
stop( ;不正确的)
}否则return()
}
目的是您可以寻找特定的程序包和/或特定的功能。 funcs =
参数可以是固定的字符串(逐字记录),但是由于我认为您可能想与任何 mutate * 匹配。 code>函数(等),也可以使其成为正则表达式。所有功能都必须是完整的
package :: funcname
,而不仅仅是 funcname
(尽管您当然可以使其成为正则表达式: -)。
myfun1<-function(){
search_calling_pkg(pkgs = dplyr; )
NULL
}
myfun2<-function(){
search_calling_pkg(funcs = c( dplyr :: mutate。*, dplyr :: summarize 。*))
NULL
}
mutate<-function(df,x){force(x);空值; }
mtcars [1:2,]% >%mutate(myfun1())
#错误:不正确
mtcars [1:2,]%>%dplyr :: mutate(myfun1())
#mpg cyl disp hp drat wt qsec vs am gear碳水化合物
#1 21 6 160110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160110 3.9 2.875 17.02 0 1 4 4
mtcars [1:2,]%>%mutate(myfun2())
#错误:不正确
mtcars [1:2,]%>%dplyr :: mutate(myfun2())
#mpg cyl disp hp drat wt qsec vs am gear碳水化合物
#1 21 6 160110 3.9 2.620 16.46 0 1 4 4
#2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
性能似乎比第一个答案要好得多,尽管仍然不是 ;零点击性能:
microbenchmark :: microbenchmark(
a = mtcars%>%
dplyr :: mutate(),
b = mtcars%>%
dplyr :: mutate(myfun1())
)
#单位:毫秒
#expr min lq平均中位数uq最大净值
#a 1.5965 1.7444 1.883837 1.82955 1.91655 3.0574 100
#b 3.4748 3.7335 4.187005 3.92580 4.18140 19.4343 100
(这部分是为了繁荣,尽管请注意, getAnywhere
将找到 dplyr :: mutate
,即使上面的非dplyr mutate
已定义并调用。)
<在Rui的链接的支持下,我建议寻找特定功能很可能会错过新功能和/或其他有效但名称不同的功能。 (我没有一个明确的示例。)从这里开始,考虑寻找特定的软件包而不是特定的函数。
search_calling_pkg<-函数(pkgs){
call_st<-lapply(sys.calls(),`[[`,1)
res <--any(vapply(call_st,function(ca )any(pkgs%in%tryCatch(getAnywhere(as.character(ca)[1])$ where,error = function(e)″))),logical(1)))
if(! res){
stop( not from packages)
} else return()
}
myfun<-function(){
search_calling_pkg( ; package:dplyr)
NULL
}
意识到这不是一个便宜的操作。我相信大部分时间都在处理调用树,也许不是我们可以轻易补救的事情。
microbenchmark :: microbenchmark(
a = mtcars%>%mutate(),
b = mtcars%>%mutate(myfun())
)
#单位:毫秒
#expr min lq平均中位数uq max neval
#a 1.872101 2.165801 2.531046 2.312051 2.72835 4.861202 100
#b 546.916301 571.909551 603.528225 589.995251 612.20240 798.707300 100
如果您认为它不会被频繁调用,并且您的函数需要花费一点时间,那么半秒的延迟可能不会那么明显,但是有了这个玩具例如,差异是明显的。
I want to create a function myfun
that can only be used inside another function, in my case dplyr
s mutate
or summarise
. I further do not want to rely on dplyr
s internals (for example mask$...
).
I came up with a quick and dirty workaround: A function search_calling_fn
that checks all function names in the call stack and looks for a specific pattern in the calling functions.
search_calling_fn <- function(pattern) {
call_st <- lapply(sys.calls(), `[[`, 1)
res <- any(unlist(lapply(call_st, function(x) grepl(pattern, x, perl = TRUE))))
if (!res) {
stop("`myfun()` must only be used inside dplyr::mutate or dplyr::summarise")
} else {
return()
}
}
This works as expected as the two examples below show (dplyr
= 1.0.0)
library(dplyr)
myfun <- function() {
search_calling_fn("^mutate|^summarise")
NULL
}
# throws as expected no error
mtcars %>%
mutate(myfun())
myfun2 <- function() {
search_calling_fn("^select")
NULL
}
# throws as expected an error
mtcars %>%
mutate(myfun2())
This approach has one loophole: myfun
could be called from a function with a similar name which is not a dplyr
function. I wonder how I can check from which namespace a function on my call stack is coming. rlang
has a function call_ns
but this will only work, if the function is explicitly called with package::...
. Further, when using mutate
there is mutate_cols
an internal function and mutate.data.frame
an S3 method on the call stack - both seem to make getting the namespace even more complicated.
On a second thought I wonder whether there is a better, more official approach to achieve the same outcome: only allow myfun
to be called within dplyr
s mutate
or summarise
.
The approach should work no matter how the function is called:
mutate
dplyr::mutate
Additional note
After discussing @r2evans answer, I realize that a solution should pass the following test:
library(dplyr)
myfun <- function() {
search_calling_fn("^mutate|^summarise")
NULL
}
# an example for a function masking dplyr's mutate
mutate <- function(df, x) {
NULL
}
# should throw an error but doesn't
mtcars %>%
mutate(myfun())
So the checking function should not only look at the callstack, but also try to see which package a function on the callstack is coming from. Interestingly, RStudios debugger shows the namespace for each function on the callstack, even for internal functions. I wonder how it does this, since environment(fun))
is only working on exported functions.
Update: I'm going to "borrow" from rlang::trace_back
, since it seems to have an elegant (and working) method for determining a full package::function
for most of the call tree (some like %>%
are not always fully-resolved).
(If you're trying to reduce package bloat ... while it's unlikely you'd have dplyr
and not purrr
available, if you would prefer to do as much in base as possible, I've provided #==#
equivalent base-R calls. It's certainly feasible to try to remove some of the rlang
calls, but again ... if you're assuming dplyr
, then you definitely have rlang
around, in which case this should not be a problem.)
search_calling_pkg <- function(pkgs, funcs) {
# <borrowed from="rlang::trace_back">
frames <- sys.frames()
idx <- rlang:::trace_find_bottom(NULL, frames)
frames <- frames[idx]
parents <- sys.parents()[idx]
calls <- as.list(sys.calls()[idx])
calls <- purrr::map(calls, rlang:::call_fix_car)
#==# calls <- lapply(calls, rlang:::call_fix_car)
calls <- rlang:::add_pipe_pointer(calls, frames)
calls <- purrr::map2(calls, seq_along(calls), rlang:::maybe_add_namespace)
#==# calls <- Map(rlang:::maybe_add_namespace, calls, seq_along(calls))
# </borrowed>
calls_chr <- vapply(calls, function(cl) as.character(cl)[1], character(1))
ptn <- paste0("^(", paste(pkgs, collapse = "|"), ")::")
pkgres <- any(grepl(ptn, calls_chr))
funcres <- !missing(funcs) && any(mapply(grepl, paste0("^", funcs, "$"), list(calls_chr)))
if (!pkgres || !funcres) {
stop("not correct")
} else return()
}
The intention is that you can look for particular packages and/or particular functions. The funcs=
argument can be fixed strings (taken as verbatim), but since I thought you might want to match against any of the mutate*
functions (etc), you can also make it a regex. All functions need to be full package::funcname
, not just funcname
(though you could certainly make it a regex :-).
myfun1 <- function() {
search_calling_pkg(pkgs = "dplyr")
NULL
}
myfun2 <- function() {
search_calling_pkg(funcs = c("dplyr::mutate.*", "dplyr::summarize.*"))
NULL
}
mutate <- function(df, x) { force(x); NULL; }
mtcars[1:2,] %>% mutate(myfun1())
# Error: not correct
mtcars[1:2,] %>% dplyr::mutate(myfun1())
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
mtcars[1:2,] %>% mutate(myfun2())
# Error: not correct
mtcars[1:2,] %>% dplyr::mutate(myfun2())
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
And performance seems to be significantly better than the first answer, though still not a "zero hit" on performance:
microbenchmark::microbenchmark(
a = mtcars %>%
dplyr::mutate(),
b = mtcars %>%
dplyr::mutate(myfun1())
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# a 1.5965 1.7444 1.883837 1.82955 1.91655 3.0574 100
# b 3.4748 3.7335 4.187005 3.92580 4.18140 19.4343 100
(This portion kept for prosperity, though note that getAnywhere
will find dplyr::mutate
even if the above non-dplyr mutate
is defined and called.)
Seeded by Rui's links, I suggest that looking for specific functions might very well miss new functions and/or otherwise-valid but differently-named functions. (I don't have a clear example.) From here, consider looking for particular packages instead of particular functions.
search_calling_pkg <- function(pkgs) {
call_st <- lapply(sys.calls(), `[[`, 1)
res <- any(vapply(call_st, function(ca) any(pkgs %in% tryCatch(getAnywhere(as.character(ca)[1])$where, error=function(e) "")), logical(1)))
if (!res) {
stop("not called from packages")
} else return()
}
myfun <- function() {
search_calling_pkg("package:dplyr")
NULL
}
Realize that this is not an inexpensive operation. I believe the majority of time spent in this is dealing with the calling tree, perhaps not something we can easily remedy.
microbenchmark::microbenchmark(
a = mtcars %>% mutate(),
b = mtcars %>% mutate(myfun())
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# a 1.872101 2.165801 2.531046 2.312051 2.72835 4.861202 100
# b 546.916301 571.909551 603.528225 589.995251 612.20240 798.707300 100
If you believe it will be called infrequently and your function takes "a little time", then perhaps the half-second delay won't be that noticeable, but with this toy example the difference is palpable.
这篇关于R如何检查从某个包中的特定函数中是否调用了自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!