正则表达式在函数体中查找函数调用 [英] Regular expression to find function calls in a function body
问题描述
请将 read.table
的主体视为文本文件,并使用以下代码创建:
<$ p
ody $(read.table)
sink()
使用正则表达式,我想找到 foo(a,b,c)
(但有许多参数)在readTable.txt
中。也就是说,我希望结果包含 read.table
主体中所有被调用函数的名称。这包括格式
foo(a,bar(b,c))
的嵌套函数。保留字( return
,用于
等)和使用反标记的函数(' =='()
,'+'()
等)可以包含在内,因为我可以稍后删除它们。
所以一般来说,我正在寻找模式文本(
或 text
然后可能嵌套的函数,如 text1(text2(
),但跳过文本如果它是一个参数,而不是函数。这里是我到目前为止的地方。它接近,但不是那里。
x < - readLines( readTable.txt)
regx< - ^(([[:print:]] *)\\(+。* \\))
mat < - regexpr (regx,x)
lines < - regmatches(x,mat)
fns < - gsub(。*(|(= |(<-))),,lines )
head(fns,10)
#[1]default.stringsAsFactors()!missing(text))
#[3]\UTF-8 ())on.exit(close(file))(is.character(file))
#[6](nzchar(fileEncoding))fileEncoding)\rt \)
#[9]on.exit(close(file))\connection \)
例如,在上面的 [9]
中,调用在那里,但我不想要文件
e结果。理想情况下,它会是 on.exit(close(
。
如何改进这个正则表达式?
如果你曾经尝试用正则表达式解析HTML,你会知道它可能是一场噩梦。使用一些HTML解析器并以这种方式提取信息,我对R代码的感觉是一样的,R的优点在于它的功能强大,并且可以通过代码检查任何函数。
就像
call.ignore< -c([[,[,&,& amp ;&,|,||,==,!=,
- ,+,*,/,!,> ;(,<,:)
find.funcs< - function(f,descend = FALSE){
if(is.function(f)){
return(find.funcs(body(f),descend = descend))
} else if(is(f,name)| is.atomic(f)){
return(字符(0))
}
v< - list()
if(is(f,call)&&!(deparse(f [[1]])%在%call.ignore)){
v [[1]] < - deparse(f)
if(!descend)return(v [[1]])
}
v < - append(v,lapply .list(f),find.funcs,descend = descend))
unname(do.call(c,v))
}
可以工作。这里我们遍历函数中的每个对象,寻找 call
s,忽略那些你不关心的。你可以在一个像
find.funcs(read.table)
# 1]default.stringsAsFactors()
#[2]missing(file)
#[3]missing(text)
#[4]textConnection(text,编码= \UTF-8 \)
#[5]on.exit(close(file))
#[6]is.character(file)
$ ...
您可以设置 descend =
>参数为 TRUE
如果您想查看对其他函数的函数调用。
我确定有很多软件包可以让这更容易,但我只是想表明它的真实性有多简单。
Please consider the body of read.table
as a text file, created with the following code:
sink("readTable.txt")
body(read.table)
sink()
Using regular expressions, I'd like to find all function calls of the form foo(a, b, c)
(but with any number of arguments) in "readTable.txt"
. That is, I'd like the result to contain the names of all called functions in the body of read.table
. This includes nested functions of the form
foo(a, bar(b, c))
. Reserved words (return
, for
, etc) and functions that use back-ticks ('=='()
, '+'()
, etc) can be included since I can remove them later.
So in general, I'm looking for the pattern text(
or text (
then possible nested functions like text1(text2(
, but skipping over the text if it's an argument, and not a function. Here's where I'm at so far. It's close, but not quite there.
x <- readLines("readTable.txt")
regx <- "^(([[:print:]]*)\\(+.*\\))"
mat <- regexpr(regx, x)
lines <- regmatches(x, mat)
fns <- gsub(".*( |(=|(<-)))", "", lines)
head(fns, 10)
# [1] "default.stringsAsFactors()" "!missing(text))"
# [3] "\"UTF-8\")" "on.exit(close(file))" "(is.character(file))"
# [6] "(nzchar(fileEncoding))" "fileEncoding)" "\"rt\")"
# [9] "on.exit(close(file))" "\"connection\"))"
For example, in [9]
above, the calls are there, but I do not want file
in the result. Ideally it would be on.exit(close(
.
How can I go about improving this regular expression?
If you've ever tried to parse HTML with a regular expression you know what a nightmare it can be. It's always better to use some HTML parser and extract info that way. I feel the same way about R code. The beauty of R is that it's functional and you inspect any function via code.
Something like
call.ignore <-c("[[", "[", "&","&&","|","||","==","!=",
"-","+", "*","/", "!", ">","<", ":")
find.funcs <- function(f, descend=FALSE) {
if( is.function(f)) {
return(find.funcs(body(f), descend=descend))
} else if (is(f, "name") | is.atomic(f)) {
return(character(0))
}
v <- list()
if (is(f, "call") && !(deparse(f[[1]]) %in% call.ignore)) {
v[[1]] <- deparse(f)
if(!descend) return(v[[1]])
}
v <- append(v, lapply(as.list(f), find.funcs, descend=descend))
unname(do.call(c, v))
}
could work. Here we iterate over each object in the function looking for call
s, ignoring those you don't care about. You would run it on a function like
find.funcs(read.table)
# [1] "default.stringsAsFactors()"
# [2] "missing(file)"
# [3] "missing(text)"
# [4] "textConnection(text, encoding = \"UTF-8\")"
# [5] "on.exit(close(file))"
# [6] "is.character(file)"
# ...
You can set the descend=
parameter to TRUE
if you want to look in calls to functions for other functions.
I'm sure there are plenty of packages that make this easier, but I just wanted to show how simple it really is.
这篇关于正则表达式在函数体中查找函数调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!