您如何使用正则表达式捕获一个组? [英] How do you capture a group with regex?

查看:83
本文介绍了您如何使用正则表达式捕获一个组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用正则表达式从另一个字符串中提取一个字符串. 我正在使用POSIX正则表达式函数(regcomp, regexec ...),但无法捕获组...

I'm trying to extract a string from another using regex. I'm using the POSIX regex functions (regcomp, regexec ...), and I fail at capturing a group ...

例如,让模式像"MAIL FROM:<(.*)>"
一样简单 (带有REG_EXTENDED cflags)

For instance, let the pattern be something as simple as "MAIL FROM:<(.*)>"
(with REG_EXTENDED cflags)

我想捕捉'<'之间的所有内容和'>'

I want to capture everything between '<' and '>'

我的问题是regmatch_t给了我整个模式的边界(MAIL FROM:< ...>),而不是括号之间的内容...

My problem is that regmatch_t gives me the boundaries of the whole pattern (MAIL FROM:<...>) instead of just what's between the parenthesis ...

我想念什么?

预先感谢

修改:一些代码

#define SENDER_REGEX "MAIL FROM:<(.*)>"

int main(int ac, char **av)
{
  regex_t regex;
  int status;
  regmatch_t pmatch[1];

  if (regcomp(&regex, SENDER_REGEX, REG_ICASE|REG_EXTENDED) != 0)
    printf("regcomp error\n");
  status = regexec(&regex, av[1], 1, pmatch, 0);
  regfree(&regex);
  if (!status)
      printf(  "matched from %d (%c) to %d (%c)\n"
             , pmatch[0].rm_so
             , av[1][pmatch[0].rm_so]
             , pmatch[0].rm_eo
             , av[1][pmatch[0].rm_eo]
            );

  return (0);
}

输出:

$./a.out "012345MAIL FROM:<abcd>$"
matched from 6 (M) to 22 ($)

解决方案:

正如RarrRarrRarr所说,索引确实在pmatch[1].rm_sopmatch[1].rm_eo
中 因此regmatch_t pmatch[1];变为regmatch_t pmatch[2];
并且regexec(&regex, av[1], 1, pmatch, 0);变为regexec(&regex, av[1], 2, pmatch, 0);

as RarrRarrRarr said, the indices are indeed in pmatch[1].rm_so and pmatch[1].rm_eo
hence regmatch_t pmatch[1]; becomes regmatch_t pmatch[2];
and regexec(&regex, av[1], 1, pmatch, 0); becomes regexec(&regex, av[1], 2, pmatch, 0);

谢谢:)

推荐答案

正如您所注意到的,regmatch_t结构的pmatch数组的第0个元素将包含匹配的整个字符串的边界.在您的示例中,您对索引1而不是索引0处的regmatch_t感兴趣,以便通过子表达式获取有关字符串匹配的信息.

The 0th element of the pmatch array of regmatch_t structs will contain the boundaries of the whole string matched, as you have noticed. In your example, you are interested in the regmatch_t at index 1, not at index 0, in order to get information about the string matches by the subexpression.

如果您需要更多帮助,请尝试编辑问题以包括实际的小代码示例,以便人们可以更轻松地发现问题.

If you need more help, try editing your question to include an actual small code sample so that people can more easily spot the problem.

这篇关于您如何使用正则表达式捕获一个组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆