正则表达式在函数体中查找函数调用 [英] Regular expression to find function calls in a function body

查看:121
本文介绍了正则表达式在函数体中查找函数调用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请将 read.table 的主体视为文本文件,并使用以下代码创建:

<$ p
ody $(read.table)
sink()

使用正则表达式,我想找到 foo(a,b,c) (但有许多参数)在readTable.txt中。也就是说,我希望结果包含 read.table 主体中所有被调用函数的名称。这包括格式

foo(a,bar(b,c))的嵌套函数。保留字( return 用于等)和使用反标记的函数(' =='()'+'()等)可以包含在内,因为我可以稍后删除它们。



所以一般来说,我正在寻找模式文本( text 然后可能嵌套的函数,如 text1(text2(),但跳过文本如果它是一个参数,而不是函数。这里是我到目前为止的地方。它接近,但不是那里。

  x < -  readLines( readTable.txt)
regx< - ^(([[:print:]] *)\\(+。* \\))
mat < - regexpr (regx,x)
lines < - regmatches(x,mat)
fns < - gsub(。*(|(= |(<-))),,lines )
head(fns,10)
#[1]default.stringsAsFactors()!missing(text))
#[3]\UTF-8 ())on.exit(close(file))(is.character(file))
#[6](nzchar(fileEncoding))fileEncoding)\rt \)
#[9]on.exit(close(file))\connection \)

例如,在上面的 [9] 中,调用在那里,但我不想要文件 e结果。理想情况下,它会是 on.exit(close(



如何改进这个正则表达式?

解决方案

如果你曾经尝试用正则表达式解析HTML,你会知道它可能是一场噩梦。使用一些HTML解析器并以这种方式提取信息,我对R代码的感觉是一样的,R的优点在于它的功能强大,并且可以通过代码检查任何函数。



就像

  call.ignore< -c([[,[,&,& amp ;&,|,||,==,!=,
- ,+,*,/,!,> ;(,<,:)

find.funcs< - function(f,descend = FALSE){
if(is.function(f)){
return(find.funcs(body(f),descend = descend))
} else if(is(f,name)| is.atomic(f)){
return(字符(0))
}
v< - list()
if(is(f,call)&&!(deparse(f [[1]])%在%call.ignore)){
v [[1]] < - deparse(f)
if(!descend)return(v [[1]])
}
v < - append(v,lapply .list(f),find.funcs,descend = descend))
unname(do.call(c,v))
}

可以工作。这里我们遍历函数中的每个对象,寻找 call s,忽略那些你不关心的。你可以在一个像

  find.funcs(read.table)

# 1]default.stringsAsFactors()
#[2]missing(file)
#[3]missing(text)
#[4]textConnection(text,编码= \UTF-8 \)
#[5]on.exit(close(file))
#[6]is.character(file)
$ ...

您可以设置 descend = >参数为 TRUE 如果您想查看对其他函数的函数调用。



我确定有很多软件包可以让这更容易,但我只是想表明它的真实性有多简单。


Please consider the body of read.table as a text file, created with the following code:

sink("readTable.txt")
body(read.table)
sink()

Using regular expressions, I'd like to find all function calls of the form foo(a, b, c) (but with any number of arguments) in "readTable.txt". That is, I'd like the result to contain the names of all called functions in the body of read.table. This includes nested functions of the form
foo(a, bar(b, c)). Reserved words (return, for, etc) and functions that use back-ticks ('=='(), '+'(), etc) can be included since I can remove them later.

So in general, I'm looking for the pattern text( or text ( then possible nested functions like text1(text2(, but skipping over the text if it's an argument, and not a function. Here's where I'm at so far. It's close, but not quite there.

x <- readLines("readTable.txt")
regx <- "^(([[:print:]]*)\\(+.*\\))"
mat <- regexpr(regx, x)
lines <- regmatches(x, mat)
fns <- gsub(".*( |(=|(<-)))", "", lines)
head(fns, 10)
# [1] "default.stringsAsFactors()" "!missing(text))"
# [3] "\"UTF-8\")" "on.exit(close(file))" "(is.character(file))"
# [6] "(nzchar(fileEncoding))" "fileEncoding)" "\"rt\")"
# [9] "on.exit(close(file))" "\"connection\"))"

For example, in [9] above, the calls are there, but I do not want file in the result. Ideally it would be on.exit(close(.

How can I go about improving this regular expression?

解决方案

If you've ever tried to parse HTML with a regular expression you know what a nightmare it can be. It's always better to use some HTML parser and extract info that way. I feel the same way about R code. The beauty of R is that it's functional and you inspect any function via code.

Something like

call.ignore <-c("[[", "[", "&","&&","|","||","==","!=",
    "-","+", "*","/", "!", ">","<", ":")

find.funcs <- function(f, descend=FALSE) {
    if( is.function(f)) {
        return(find.funcs(body(f), descend=descend))
    } else if (is(f, "name") | is.atomic(f)) {
        return(character(0))
    }
    v <- list()
    if (is(f, "call") && !(deparse(f[[1]]) %in% call.ignore)) {
        v[[1]] <- deparse(f)
        if(!descend) return(v[[1]])
    } 
    v <- append(v, lapply(as.list(f), find.funcs, descend=descend))
    unname(do.call(c, v))
}

could work. Here we iterate over each object in the function looking for calls, ignoring those you don't care about. You would run it on a function like

find.funcs(read.table)

# [1] "default.stringsAsFactors()"                
# [2] "missing(file)"                             
# [3] "missing(text)"                             
# [4] "textConnection(text, encoding = \"UTF-8\")"
# [5] "on.exit(close(file))"                      
# [6] "is.character(file)"  
# ...

You can set the descend= parameter to TRUE if you want to look in calls to functions for other functions.

I'm sure there are plenty of packages that make this easier, but I just wanted to show how simple it really is.

这篇关于正则表达式在函数体中查找函数调用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆