正则表达式可识别C中的变量声明 [英] Regular expression to recognize variable declarations in C

查看:115
本文介绍了正则表达式可识别C中的变量声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用正则表达式来识别C语言中的变量声明,而我已经明白了.

I'm working on a regular expression to recognize variable declarations in C and I have got this.

[a-zA-Z_][a-zA-Z0-9]*

有没有更好的解决方案?

Is there any better solution?

推荐答案

一种在C语言中识别变量声明的模式.查看常规声明,我们看到:

A pattern to recognize variable declarations in C. Looking at a conventional declaration, we see:

int variable;

在这种情况下,应该先测试 type 关键字,以避免匹配其他内容,例如字符串或预处理器定义的常量

If that's the case, one should test for the type keyword before anything, to avoid matching something else, like a string or a constant defined with the preprocessor

(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9]+)

变量名位于\ 1.

您需要的功能是后视/前瞻.

The feature you need is look-behind/look-ahead.

更新2015年7月11日

以前的正则表达式无法在中间的任何位置将某些变量与_匹配.要解决此问题,只需将_添加到第一个捕获组的第二部分,它还假定两个或多个字符的变量名,这就是修复后的样子:

The previous regex fail to match some variables with _ anywhere in the middle. To fix that, one just have to add the _ to the second part of the first capture group, it also assume variable names of two or more characters, this is how it looks after the fix:

(?:\w+\s+)([a-zA-Z_][a-zA-Z0-9_]*)

但是,此正则表达式有很多误报,goto jump;是其中之一,坦率地说,它不适合该工作,因此,我决定创建另一个正则表达式来覆盖更广泛的情况,尽管它很远从完美到这里:

However, this regular expression has many false positives, goto jump; being one of them, frankly it's not suitable for the job, because of that, I decided to create another regex to cover a wider range of cases, though it's far from perfect, here it is:

\b(?:(?:auto\s*|const\s*|unsigned\s*|signed\s*|register\s*|volatile\s*|static\s*|void\s*|short\s*|long\s*|char\s*|int\s*|float\s*|double\s*|_Bool\s*|complex\s*)+)(?:\s+\*?\*?\s*)([a-zA-Z_][a-zA-Z0-9_]*)\s*[\[;,=)]

我已经用Ruby,Python和JavaScript测试过此正则表达式,并且在常见情况下效果很好,但是在某些情况下会失败.另外,尽管很难在保持多个regex引擎之间的可移植性的同时进行优化,但是regex可能需要进行一些优化.

I've tested this regex with Ruby, Python and JavaScript and it works very well for the common cases, however it fails in some cases. Also, the regex may need some optimizations, though it is hard to do optimizations while maintaining portability across several regex engines.

unsignedchar *var;                   /* OK, doesn't match */
goto **label;                        /* OK, doesn't match */
int function();                      /* OK, doesn't match */
char **a_pointer_to_a_pointer;       /* OK, matches +a_pointer_to_a_pointer+ */
register unsigned char *variable;    /* OK, matches +variable+ */
long long factorial(int n)           /* OK, matches +n+ */
int main(int argc, int *argv[])      /* OK, matches +argc+ and +argv+ (needs two passes) */
const * char var;                    /* OK, matches +var+, however, it doesn't consider +const *+ as part of the declaration */
int i=0, j=0;                        /* 50%, matches +i+ but it will not match j after the first pass */
int (*functionPtr)(int,int);         /* FAIL, doesn't match (too complex) */

误报

以下情况很难用可移植的正则表达式覆盖,文本编辑器使用上下文来避免在引号内突出显示文本.

printf("int i=%d", i);               /* FAIL, match i inside quotes */

误报(语法错误)

如果在应用正则表达式之前测试源文件的语法,则可以修复此问题.使用GCC和Clang,只需传递-fsyntax-only标志即可测试源文件的语法,而无需对其进行编译

int char variable;                  /* matches +variable+ */

这篇关于正则表达式可识别C中的变量声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆