在C中编译/匹配POSIX正则表达式 [英] Compiling/Matching POSIX Regular Expressions in C

查看:92
本文介绍了在C中编译/匹配POSIX正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配字符串pcode中的以下项目:

I'm trying to match the following items in the string pcode:

  • u后跟1或2位数字
  • phaseu
  • phasep
  • x(由非单词字符包围)
  • y(由非单词字符包围)
  • z(由非单词字符包围)
  • u followed by a 1 or 2 digit number
  • phaseu
  • phasep
  • x (surrounded by non-word chars)
  • y (surrounded by non-word chars)
  • z (surrounded by non-word chars)

我尝试使用POSIX正则表达式函数(如下所示)实现正则表达式匹配,但是有两个问题:

I've tried to implement a regex match using the POSIX regex functions (shown below), but have two problems:

  1. 已编译模式似乎没有子模式(即,n.sub == 0).
  2. 该模式在字符串"u0"中找不到匹配项,这确实是应该的!

我有信心正则表达式字符串本身可以工作(因为它可以在python和TextMate中工作),我的问题在于C语言中的编译等问题.在此工作方面的任何帮助将不胜感激.

I'm confident that the regex string itself is working—in that it works in python and TextMate—my problem lies with the compilation, etc. in C. Any help with getting that working would be much appreciated.

预先感谢您的回答.

if(idata=tb_find(deftb,pdata)){
    MESSAGE("Global variable!\n");
    char pattern[80] = "((u[0-9]{1,2})|(phaseu)|(phasep)|[\\W]+([xyz])[\\W]+)";
    MESSAGE("Pattern = \"%s\"\n",pattern);
    regex_t compiled;
    if(regcomp(&compiled, pattern, 0) == 0){
        MESSAGE("Compiled regular expression \"%s\".\n", pattern);
    }

    int nsub = compiled.re_nsub;
    MESSAGE("nsub = %d.\n",nsub);
    regmatch_t matchptr[nsub];
    int err;
    if(err = regexec (&compiled, pcode, nsub, matchptr, 0)){
        if(err == REG_NOMATCH){
            MESSAGE("Regular expression did not match.\n");
        }else if(err == REG_ESPACE){
            MESSAGE("Ran out of memory.\n");
        }
    }
    regfree(&compiled);
}

推荐答案

似乎您打算使用类似于扩展的" POSIX regex语法的内容. POSIX定义了两种不同的正则表达式语法,即基本"(读为过时")语法和扩展"语法.要使用扩展语法,您需要为regcomp添加REG_EXTENDED标志:

It seems you intend to use something resembling the "extended" POSIX regex syntax. POSIX defines two different regex syntaxes, a "basic" (read "obsolete") syntax and the "extended" syntax. To use the extended syntax, you need to add the REG_EXTENDED flag for regcomp:

...
if(regcomp(&compiled, pattern, REG_EXTENDED) == 0){
...

没有此标志,regcomp将使用基本" regex语法.有一些重要的区别,例如:

Without this flag, regcomp will use the "basic" regex syntax. There are some important differences, such as:

  • 不支持|运算符
  • 子比赛的括号\(\)
  • No support for the | operator
  • The brackets for submatches need to be escaped, \( and \)

还应注意, POSIX扩展的正则表达式语法与Python的正则表达式1:1不兼容(不了解TextMate).特别是,我担心您的正则表达式的这一部分在POSIX中不起作用,或者至少是不可移植的:

It should be also noted that the POSIX extended regex syntax is not 1:1 compatible with Python's regex (don't know about TextMate). In particular, I'm afraid this part of your regexp does not work in POSIX, or at least is not portable:

 [\\W]

POSIX指定非空格字符的方式是:

The POSIX way to specify non-space characters is:

 [^[:space:]]

您的POSIX整个正则表达式应在C中看起来像这样:

Your whole regexp for POSIX should then look like this in C:

 char *pattern = "((u[0-9]{1,2})|(phaseu)|(phasep)|[^[:space:]]+([xyz])[^[:space:]]+)";

这篇关于在C中编译/匹配POSIX正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆