正则表达式,用于从C ++代码中提取函数 [英] Regex for extracting functions from C++ code

查看:246
本文介绍了正则表达式,用于从C ++代码中提取函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有示例C ++代码( http://pastebin.com/6q7zs7tc )提取函数名称以及函数所需的参数数量。到目前为止,我已经编写了此正则表达式,但是它对我来说并不完美。

I have sample C++ code (http://pastebin.com/6q7zs7tc) from which I have to extract functions names as well as the number of parameters that a function requires. So far I have written this regex, but it's not working perfectly for me.

(?![a-z])[^\:,>,\.]([a-z,A-Z]+[_]*[a-z,A-Z]*)+[(]


推荐答案

使用正则表达式无法可靠地解析C ++。

You can't parse C++ reliably with regex.

实际上,您无法使用弱解析技术对其进行解析(请参见为什么不能用LR(1)解析器解析C ++?)。如果希望可靠地从源文件中提取此信息,则需要经过时间检验的C ++解析器;请参见https://stackoverflow.com/a/28825789/120163

In fact, you can't parse it with weak parsing technology (See Why can't C++ be parsed with a LR(1) parser?). If you expect to get extract this information reliably from source files, you will need a time-tested C++ parser; see https://stackoverflow.com/a/28825789/120163

如果您不在乎您的提取过程容易出错,那么您可以使用正则表达式,也许还可以使用其他一些黑客工具。启发式提取的关键问题是匹配各种括号,例如[...],< ...> (对于变速操作而言,这是行不通的ors)和{...}。括号匹配要求您保留一堆可见的括号。并且在存在宏和预处理器条件的情况下,括号匹配可能会失败。

If you don't care that your extraction process is flaky, then you can use a regex and maybe some additional hackery. Your key problem for heuristic extraction is matching various kinds of brackets, e.g., [...], < ... > (which won't quite work for shift operators) and { ... }. Bracket matching requires you to keep a stack of seen brackets. And bracket matching may fail in the presence of macros and preprocessor conditionals.

这篇关于正则表达式,用于从C ++代码中提取函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆