在 C 中使用正则表达式时,\d 不起作用,但 [0-9] 起作用 [英] When using regex in C, \d does not work but [0-9] does

查看:56
本文介绍了在 C 中使用正则表达式时,\d 不起作用,但 [0-9] 起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么包含 \d 字符类的正则表达式模式不起作用,但 [0-9] 起作用.字符类,例如 \s(空白字符)和 \w(单词字符),可以工作.我的编译器是 gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3.我正在使用 C 正则表达式库.

I do not understand why the regex pattern containing the \d character class does not work but [0-9] does. Character classes, such as \s (whitespace characters) and \w (word characters), do work. My compiler is gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3. I am using the C regular expression library.

为什么 \d 不起作用?

文本字符串:

const char *text = "148  apples    5 oranges";

对于上面的文本字符串,这个正则表达式不匹配:

For the above text string, this regex does not match:

const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";

当使用 [0-9] 而不是 \d 时,这个正则表达式匹配:

This regex matches when using [0-9] instead of \d:

const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";



#include <stdio.h>
#include <stdlib.h>
#include <regex.h>

#define N_MATCHES  30

//   output from gcc --version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
//   compile command used:  gcc -o tstc_regex tstc_regex.c

const char *text = "148  apples    5 oranges";
  const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";    // finds match
//const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";        // does not find match

int main(int argc, char**argv)
{
    regex_t   rgx;
    regmatch_t   matches[N_MATCHES];
    int status;
    status = regcomp(&rgx, rstr, REG_EXTENDED | REG_NEWLINE);
    if (status != 0) {
        fprintf(stdout, "regcomp error: %d\n", status);
        return 1;
    }
    status = regexec(&rgx, text, N_MATCHES, matches, 0);
    if (status == REG_NOMATCH) {
        fprintf(stdout, "regexec result: REG_NOMATCH (%d)\n", status);
    }
    else if (status != 0) {
        fprintf(stdout, "regexec error: %d\n", status);
        return 1;
    }
    else {
        fprintf(stdout, "regexec match found: %d\n", status);
    }
    return 0;
}

推荐答案

您使用的正则表达式风格是 GNU ERE,它类似于 POSIX ERE,但具有一些额外的功能.其中包括对字符类速记 \s\S\w\W 的支持,但是不是 \d\D.您可以在此处找到更多信息.

The regex flavor you're using is GNU ERE, which is similar to POSIX ERE, but with a few extra features. Among these are support for the character class shorthands \s, \S, \w and \W, but not \d and \D. You can find more info here.

这篇关于在 C 中使用正则表达式时,\d 不起作用,但 [0-9] 起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆