在 C 中使用正则表达式时,\d 不起作用,但 [0-9] 起作用 [英] When using regex in C, \d does not work but [0-9] does
问题描述
我不明白为什么包含 \d
字符类的正则表达式模式不起作用,但 [0-9]
起作用.字符类,例如 \s
(空白字符)和 \w
(单词字符),可以工作.我的编译器是 gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3.我正在使用 C 正则表达式库.
I do not understand why the regex pattern containing the \d
character class does not work but [0-9]
does. Character classes, such as \s
(whitespace characters) and \w
(word characters), do work. My compiler is gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3. I am using the C regular expression library.
为什么 \d
不起作用?
文本字符串:
const char *text = "148 apples 5 oranges";
对于上面的文本字符串,这个正则表达式不匹配:
For the above text string, this regex does not match:
const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$";
当使用 [0-9] 而不是 \d 时,这个正则表达式匹配:
This regex matches when using [0-9] instead of \d:
const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$";
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#define N_MATCHES 30
// output from gcc --version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
// compile command used: gcc -o tstc_regex tstc_regex.c
const char *text = "148 apples 5 oranges";
const char *rstr = "^[0-9]+\\s+\\w+\\s+[0-9]+\\s+\\w+$"; // finds match
//const char *rstr = "^\\d+\\s+\\w+\\s+\\d+\\s+\\w+$"; // does not find match
int main(int argc, char**argv)
{
regex_t rgx;
regmatch_t matches[N_MATCHES];
int status;
status = regcomp(&rgx, rstr, REG_EXTENDED | REG_NEWLINE);
if (status != 0) {
fprintf(stdout, "regcomp error: %d\n", status);
return 1;
}
status = regexec(&rgx, text, N_MATCHES, matches, 0);
if (status == REG_NOMATCH) {
fprintf(stdout, "regexec result: REG_NOMATCH (%d)\n", status);
}
else if (status != 0) {
fprintf(stdout, "regexec error: %d\n", status);
return 1;
}
else {
fprintf(stdout, "regexec match found: %d\n", status);
}
return 0;
}
推荐答案
您使用的正则表达式风格是 GNU ERE,它类似于 POSIX ERE,但具有一些额外的功能.其中包括对字符类速记 \s
、\S
、\w
和 \W
的支持,但是不是 \d
和 \D
.您可以在此处找到更多信息.
The regex flavor you're using is GNU ERE, which is similar to POSIX ERE, but with a few extra features. Among these are support for the character class shorthands \s
, \S
, \w
and \W
, but not \d
and \D
. You can find more info here.
这篇关于在 C 中使用正则表达式时,\d 不起作用,但 [0-9] 起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!