为什么posix c中的regexec()总是返回第一个匹配项,如何返回所有匹配位置只运行一次? [英] why regexec() in posix c always return the first match,how can it return all match positions only run once?

查看:80
本文介绍了为什么posix c中的regexec()总是返回第一个匹配项,如何返回所有匹配位置只运行一次?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,当我想返回str中的所有匹配位置时,例如:

Now when I want to return all match positions in str, such as:

abcd123abcd123abcd

假设我要获取所有"abcd",我必须使用regexec(),获取第一个位置:0、3,然后我将使用:

Suppose I want to get all "abcd", I must use regexec(),get the first position:0, 3, then I will use:

123abcd123abcd

作为要再次使用regexec()的新字符串,依此类推.我读了有关regexec()的手册,上面写着:

as the new string to use regexec() again, and so on. I read the manual about regexec(), it says:

int regexec(const regex_t *preg, const char *string, size_t nmatch,
               regmatch_t pmatch[], int eflags);
nmatch and pmatch are used to provide information regarding the location of any 
matches.

但是为什么不起作用?这是我的代码:

but why doesn't this work? This is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

int main(int argc, char **argv)
{
   int i = 0;
   int res;
   int len;
   char result[BUFSIZ];
   char err_buf[BUFSIZ];
   char* src = argv[1];  

   const char* pattern = "\\<[^,;]+\\>";
   regex_t preg;

   regmatch_t pmatch[10];

   if( (res = regcomp(&preg, pattern, REG_EXTENDED)) != 0)
   {
      regerror(res, &preg, err_buf, BUFSIZ);
      printf("regcomp: %s\n", err_buf);
      exit(res);
   }

   res = regexec(&preg, src, 10, pmatch, REG_NOTBOL);
   //~ res = regexec(&preg, src, 10, pmatch, 0);
   //~ res = regexec(&preg, src, 10, pmatch, REG_NOTEOL);
   if(res == REG_NOMATCH)
   {
      printf("NO match\n");
      exit(0);
   }
   for (i = 0; pmatch[i].rm_so != -1; i++)
   {
      len = pmatch[i].rm_eo - pmatch[i].rm_so;
      memcpy(result, src + pmatch[i].rm_so, len);
      result[len] = 0;
      printf("num %d: '%s'\n", i, result);
   }
   regfree(&preg);
   return 0;
}

./regex 'hello, world'

输出:

num 0: 'hello'

这是我尊重的输出:

num 0: 'hello'
num 1: 'world'

推荐答案

regexec 执行正则表达式匹配.找到匹配项后,regexec将返回零(即成功匹配项).参数 pmatch 将包含有关该一场比赛的信息.第一个数组索引(即零)将包含整个匹配项,随后的数组索引包含有关捕获组/子表达式的信息.

regexec performs a regex match. Once a match has been found regexec will return zero (i.e. successful match). The parameter pmatch will contain information about that one match. The first array index (i.e. zero) will contain the entire match, subsequent array indices contain information about capture groups/sub-expressions.

演示:

const char* pattern = "(\\w+) (\\w+)";

在"hello world"中匹配的内容将输出:

matched on "hello world" will output:

num 0: 'hello world'  - entire match
num 1: 'hello'        - capture group 1
num 2: 'world'        - capture group 2

(在操作中查看)

在大多数正则表达式环境中,您可以通过使用全局修饰符/g来获得所需的行为.Regexec不提供此修饰符作为标志,也不支持修饰符.因此,您必须在regexec从上一个匹配项的最后一个字符开始返回零的情况下循环,以获取所有匹配项.

In most regex environments the behaviour you seek could have been gotten by using the global modifier: /g. Regexec does not provide this modifier as a flag nor does it support modifiers. You will therefore have to loop while regexec returns zero starting from the last character of the previous match to get all matches.

使用PCRE库(著名的regex C库)也无法使用global修饰符.PCRE手册页对此有这样的说明:

The global modifier is also not available using the PCRE library (famous regex C library). The PCRE man pages have this to say about it:

通过使用适当的参数多次调用 pcre_exec(),您可以可以模仿Perl的/g选项

By calling pcre_exec() multiple times with appropriate arguments, you can mimic Perl's /g option

这篇关于为什么posix c中的regexec()总是返回第一个匹配项,如何返回所有匹配位置只运行一次?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆