识别R函数和脚本的依赖关系 [英] Identifying dependencies of R functions and scripts

查看:128
本文介绍了识别R函数和脚本的依赖关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在筛选一个使用该包的包和脚本,并且想要识别外部依赖关系。目标是修改脚本以指定 library(pkgName)并修改程序包中的函数以使用 require(pkgName),所以这些依赖关系将在以后更加明显。



我正在修改代码以解决每个外部依赖的包。例如,尽管这绝对不是确定性的,但我现在发现难以识别取决于 data.table 的代码。我可以用 Matrix ggplot2 替换 data.table bigmemory plyr 或许多其他软件包,请随时回答基于其他软件包的示例。



这个搜索并不是特别容易。目前为止我已经尝试的方法包括:




  • 搜索的代码, 要求语句

  • 搜索 data.table (例如 library(data.table)

  • 尝试运行 codetools :: checkUsage 来确定哪里可能有一些问题。对于脚本,我的程序将脚本插入本地函数,并将 checkUsage 应用于该函数。否则,我使用 checkUsagePackage

  • 查找 data.table有点独特的语句,例如:=

  • 查找可以通过匈牙利语符号识别对象的类,例如 DT



我的搜索的精髓是找到: p>


  • 加载 data.table

  • 具有名称的对象是 data.table 对象,

  • 似乎是 data.table的方法特定



其中唯一容易的部分似乎是找到加载包的位置。不幸的是,并不是所有的功能都可能显式地加载或需要外部包 - 这些可能会假定它已经被加载。这是一个不好的做法,我正在努力解决它。然而,搜索对象和方法似乎是有挑战性的。



这个( data.table )只是一个包,还有一个似乎有限和有些独特的用法。假设我想查找ggplot函数的用法,其中的选项更广泛,语法的文本不是特殊的(即,经常使用 + 不是特殊的,而:= 似乎是)。



我不认为静态分析会给出完美的答案,例如可以将参数传递给一个函数,该函数指定要加载的包。尽管如此,是否有任何核心工具或软件包可以通过静态或动态分析来改善这种强力方法?



对于什么值得, tools :: pkgDepends 仅在包级别处理依赖关系,而不是函数或脚本级别,这是我正在工作的级别。



< hr>

更新1:应该工作的动态分析工具的一个例子是报告在代码执行期间加载哪些包。我不知道R中是否存在这样的功能 - 它将像 Rprof 报告输出 search()而不是代码堆栈。

解决方案

首先,感谢@ mathem.coffee把我放在使用的路上Mark Bravington的 mvbutils 包。



为了回顾一下,我想知道关于检查一个包的说法,比如说, foodweb myPackage 而不是另一个,说 externalPackage ,以及关于根据 externalPackage 。我将演示如何做到这一点。在这种情况下,外部包是 data.table



1:对于 myPackage data.table ,以下命令就足够了:

  library(mvbutils)
库(myPackage)
库(data.table)
ixWhere< - match(c(myPackage,data.table),搜索())
foodweb(where = ixWhere,prune = ls(package:data.table),descendents = FALSE)

这产生了一个很好的图表,显示哪些功能取决于 data.table 中的函数。虽然图表包含 data.table 之间的依赖关系,但它并不过分沉重:我可以很容易地看到我的哪些功能依赖于 data.table ,以及他们使用的函数,例如 as.data.table data.table := key 等等。在这一点上,可以说包依赖性问题解决了,但是,$ code> foodweb 提供了更多的东西,所以让我们来看看。很酷的部分是依赖矩阵。

  depMat<  -  foodweb(其中= ixWhere,prune = ls(package:data 。$)
depMat< - depMat [ix_sel] $ b $($)
ix_sel< - grep(^ myPackage,rownames(depMat) b depMat< - depMat [, - ix_sel]
ix_drop< - 其中(colSums(depMat)== 0)
depMat< - depMat [, - ix_drop]
ix_drop& - 其中(rowSums(depMat)== 0)
depMat< - depMat [-ix_drop,]


$ b $这很酷:它现在显示了我的包中的函数的依赖关系,我使用了详细的名字,例如 myPackage.cleanData ,对我的包中的
的函数,即 data.table 中的函数,它消除没有依赖关系的行和列。这很简单,让我快速调查依赖关系,我也可以通过处理 rownames(depMat)来找到我的功能的补充集。



注意: plotting = FALSE 似乎没有阻止创建绘图设备,至少第一次 foodweb 在一个调用序列中调用。这很麻烦,但不是很可怕。也许我做错了。



2:对于脚本与 data.table ,这会得到一点点有趣。对于每个脚本,我需要创建一个临时函数,然后检查依赖关系。我有一个下面的功能,正是这样。

  listFiles<  -  dir(pattern =myScript * .r) 
checkScriptDependencies< - function(fname){
require(mvbutils)
rawCode< - readLines(fname)
toParse< - paste(localFunc< - function ){,paste(rawCode,sep =\\\
,collapse =\\\
),},sep =\\\
,collapse =)
newFunc < eval(parse(text = toParse))
ix< - match(data.table,search())
vecPrune< - c(localFunc,ls(package:data。表))
tmpRes< - foodweb(where = c(environment(),ix),prune = vecPrune,plotting = FALSE)
tmpMat< - tmpRes $ funmat
tmpVec< ; - tmpMat [localFunc,]
return(tmpVec)
}

listDeps< - list()
for(selFile in listFiles){
listDeps [[selFile]]< - checkScriptDependencies(selFile)
}

现在,我只需要看看 listDeps ,而且我从上面的depMat中获得了同样的奇妙的小知识。我从其他写的代码修改了 checkScriptDependencies ,发送要由 codetools :: checkUsage 分析的脚本;有一个像这样的功能分析独立代码是很好的。赞成 @Spacedman @ Tommy ,使用 environment()来改善对 foodweb 的调用。 / p>

(真正的hungaRians会注意到我与名称和类型的顺序不一致 - tooBad :)有更长的原因,但这不是正确的代码我正在使用,无论如何。)






虽然我没有张贴由 foodweb 为我的代码,你可以看到一些很好的例子在 http://web.archive.org/web/20120413190726/http://www.sigmafield.org/2010/ 9月21日/ R-功能的最天食物网。在我的情况下,它的输出肯定捕获data.table的使用:= J ,以及标准命名函数,如 as.data.table 。这似乎可以消除我的文本搜索,并且是以几种方式进行改进(例如查找我忽略的功能)。



总而言之, foodweb 是一个很好的工具,我鼓励别人探索 mvbutils 包和一些Mark Bravington的其他不错的软件包,例如调试。如果您安装 mvbutils ,只需查看?changed.funs ,如果您认为只有在管理进化型R码。 :)


I am sifting through a package and scripts that utilize the package, and would like to identify external dependencies. The goal is to modify scripts to specify library(pkgName) and to modify functions in the package to use require(pkgName), so that these dependencies will be more obvious later.

I am revising the code to account for each externally dependent package. As an example, though it is by no means definitive, I am now finding it difficult to identify code that depends on data.table. I could replace data.table with Matrix, ggplot2, bigmemory, plyr, or many other packages, so feel free to answer with examples based on other packages.

This search isn't particularly easy. The approaches I have tried so far include:

  • Search the code for library and require statements
  • Search for mentions of data.table (e.g. library(data.table))
  • Try running codetools::checkUsage to determine where there may be some issues. For the scripts, my program inserts the script into a local function and applies checkUsage to that function. Otherwise, I use checkUsagePackage for the package.
  • Look for statements that are somewhat unique to data.table, such as :=.
  • Look for where objects' classes may be identified via Hungarian notation, such as DT

The essence of my searching is to find:

  • loading of data.table,
  • objects with names that indicate they are data.table objects,
  • methods that appear to be data.table-specific

The only easy part of this seems to be finding where the package is loaded. Unfortunately, not all functions may explicitly load or require the external package - these may assume it has already been loaded. This is a bad practice, and I am trying to fix it. However, searching for objects and methods seems to be challenging.

This (data.table) is just one package, and one with what seems to be limited and somewhat unique usage. Suppose I wanted to look for uses of ggplot functions, where the options are more extensive, and the text of the syntax is not as idiosyncratic (i.e. frequent usage of + is not idiosyncratic, while := seems to be).

I don't think that static analysis will give a perfect answer, e.g. one could pass an argument to a function, which specifies a package to be loaded. Nonetheless: are there any core tools or packages that can improve on this brute force approach, either via static or dynamic analysis?

For what it's worth, tools::pkgDepends only addresses dependencies at the package level, not the function or script level, which is the level I'm working at.


Update 1: An example of a dynamic analysis tool that should work is one that reports which packages are loaded during code execution. I don't know if such a capability exists in R, though - it would be like Rprof reporting the output of search() instead of the code stack.

解决方案

First, thanks to @mathematical.coffee to putting me on the path of using Mark Bravington's mvbutils package. The foodweb function is more than satisfactory.

To recap, I wanted to know about about checking one package, say myPackage versus another, say externalPackage, and about checking scripts against the externalPackage. I'll demonstrate how to do each. In this case, the external package is data.table.

1: For myPackage versus data.table, the following commands suffice:

library(mvbutils)
library(myPackage)
library(data.table)
ixWhere <- match(c("myPackage","data.table"), search())
foodweb(where = ixWhere, prune = ls("package:data.table"), descendents = FALSE)

This produces an excellent graph showing which functions depend on functions in data.table. Although the graph includes dependencies within data.table, it's not overly burdensome: I can easily see which of my functions depend on data.table, and which functions they use, such as as.data.table, data.table, :=, key, and so on. At this point, one could say the package dependency problem is solved, but foodweb offers so much more, so let's look at that. The cool part is the dependency matrix.

depMat  <- foodweb(where = ixWhere, prune = ls("package:data.table"), descendents = FALSE, plotting = FALSE)
ix_sel  <- grep("^myPackage.",rownames(depMat))
depMat  <- depMat[ix_sel,]
depMat  <- depMat[,-ix_sel]
ix_drop <- which(colSums(depMat) == 0)
depMat  <- depMat[,-ix_drop]
ix_drop <- which(rowSums(depMat) == 0)
depMat  <- depMat[-ix_drop,]

This is cool: it now shows dependencies of functions in my package, where I'm using verbose names, e.g. myPackage.cleanData, on functions not in my package, namely functions in data.table, and it eliminates rows and columns where there are no dependencies. This is concise, lets me survey dependencies quickly, and I can find the complementary set for my functions quite easily, too, by processing rownames(depMat).

NB: plotting = FALSE doesn't seem to prevent a plotting device from being created, at least the first time that foodweb is called in a sequence of calls. That is annoying, but not terrible. Maybe I'm doing something wrong.

2: For scripts versus data.table, this gets a little more interesting. For each script, I need to create a temporary function, and then check for dependencies. I have a little function below that does precisely that.

listFiles <- dir(pattern = "myScript*.r")
checkScriptDependencies <- function(fname){
    require(mvbutils)
    rawCode  <- readLines(fname)
    toParse  <- paste("localFunc <- function(){", paste(rawCode, sep = "\n", collapse = "\n"), "}", sep = "\n", collapse = "")
    newFunc  <- eval(parse(text = toParse))
    ix       <- match("data.table",search())
    vecPrune <- c("localFunc", ls("package:data.table"))
    tmpRes   <- foodweb(where = c(environment(),ix), prune = vecPrune, plotting = FALSE)
    tmpMat   <- tmpRes$funmat
    tmpVec   <- tmpMat["localFunc",]
    return(tmpVec)
}

listDeps <- list()
for(selFile in listFiles){
    listDeps[[selFile]] <- checkScriptDependencies(selFile)
}

Now, I just need to look at listDeps, and I have the same kind of wonderful little insights that I have from the depMat above. I modified checkScriptDependencies from other code that I wrote that sends scripts to be analyzed by codetools::checkUsage; it's good to have a little function like this around for analyzing standalone code. Kudos to @Spacedman and @Tommy for insights that improved the call to foodweb, using environment().

(True hungaRians will notice that I was inconsistent with the order of name and type - tooBad. :) There's a longer reason for this, but this isn't precisely the code I'm using, anyway.)


Although I've not posted pictures of the graphs produced by foodweb for my code, you can see some nice examples at http://web.archive.org/web/20120413190726/http://www.sigmafield.org/2010/09/21/r-function-of-the-day-foodweb. In my case, its output definitely captures data.table's usage of := and J, along with the standard named functions, like key and as.data.table. It seems to obviate my text searches and is an improvement in several ways (e.g. finding functions that I'd overlooked).

All in all, foodweb is an excellent tool, and I encourage others to explore the mvbutils package and some of Mark Bravington's other nice packages, such as debug. If you do install mvbutils, just check out ?changed.funs if you think that only you struggle with managing evolving R code. :)

这篇关于识别R函数和脚本的依赖关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆