ç普通防爆pressions:提取的实际匹配 [英] C Regular Expressions: Extracting the Actual Matches

查看:84
本文介绍了ç普通防爆pressions:提取的实际匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用的用C常规EX pressions(使用regex.h库)。设置为regcomp(...)和regexec(...)的标准要求(和支票)后,我只能管理打印符合我经常编前pression实际子。
使用regexec,根据手册页面,意味着你存储的子场比赛在素有regmatch_t的结构。该结构只包含rm_so和rm_eo引用我的理解是在内存中匹配的子字符串的字符地址,但我的问题是如何才能利用这些偏移和两个指针提取的实际字符串,并将其存储到一个数组(字符串理想的二维数组)?

它的工作原理,当你只是打印到标准输出,但每当你尝试使用相同的设置,但其存储在一个字符串/字符数组,它存储原本是用来匹配的前pression整个字符串。
此外,什么是%*的print语句里面呢?我想这是在其本身在指针正确读取到一个字符数组一个普通的前pression。我只是想存储集合中的匹配子,所以我可以与他们在其他地方我的软件工作。

背景:p和P2都是指针设置为指向字符串在下面的code在进入while循环之前,比赛开始:

  INT IND = 0;
而(1){
    regExErr1 = regexec(安培; R,P,10,M,0);
    //输出(没有匹配正EXPR,值%I \\ N,regExErr1);
    如果(regExErr1!= 0){
        fprintf中(标准错误,没有更多的内在规律前pression匹配\\ n!);
        打破;
    }
    的printf(什么被发现是:);
    INT I = 0;
    而(1){
        如果(M [] .rm_so == -1){
            打破;
        }
        INT开始= M [] .rm_so +(P - P2);
        INT结束= M [I] .rm_eo +(P - P2);
        的strcpy(火柴[IND],(%* S \\ n,(完成 - 开始),P2 +启动));
        的printf(存储:%。* S火柴[IND]);
        IND ++;
        的printf(%* S \\ n,(完成 - 开始),P2 +启动);
        我++;
    }
    P + = M [0] .rm_eo; //这将指针p移动到最后匹配模式的结束和一个新的开始
}
的printf(我们在[0]:%S \\ n,温度);


解决方案

有相当多的普通前pression包,但你似乎很相称之一POSIX:的 regcomp() 等。

它定义了两种结构< regex.h> 是:


  • 至少包含

    regex_t 为size_t re_nsub ,括号内的SUBEX pressions的数量。


  • 至少包含

    regmatch_t regoff_t rm_so 字节的字符串的开始偏移启动子的和 regoff_t rm_eo 字节串结束后,从第一个字符的字符串开始的偏移量。


注意''的偏移量不是指针,而是索引到的字符数组。

执行的功能是:


  • INT regexec(常量regex_t *限制$ P $皮克,为const char *限制字符串,
       为size_t nmatch,regmatch_t pmatch [限制],诠释EFLAGS);

您打印code应该是:

 的for(int i = 0; I< r.re_nsub;我++)
{
    INT开始= M [] .rm_so;
    INT结束= M [I] .rm_eo;
    的strcpy(火柴[IND],(%* S \\ n,(完成 - 开始),P +启动));
    的printf(存储:%* S \\ n,(完成 - 开始),火柴[IND]);
    IND ++;
    的printf(%* S \\ n,(完成 - 开始),P +启动);
}

请注意,这code应该升级,以确保该字符串拷贝不溢出目标字符串。这也是标志着一个字符串的开始和结束是一个好主意,比如像:

 的printf(<<%* S&G​​T;方式> \\ n,(完成 - 开始),P +启动);

这使得它整个堆容易看到的空间等。

[今后,请尝试提供一个SSCCE(短的,独立的,正确的示例)这样人们就可以帮助更多的轻松。]

这是我创建的,可能是在回答另一个问题,SO于2010年,是若干方案我一直认为我所说的'护身符'之一的SSCCE;该显示一些特征的本质(例如POSIX正则表达式,在这种情况下)的小程序。我觉得他们有用的内存慢跑。

 的#include<&stdio.h中GT;
#包括LT&;&stdlib.h中GT;
#包括LT&;&string.h中GT;
#包括LT&;&errno.h中GT;
#包括LT&;&regex.h GT;的#define找到相当^ DAEMONS = \\\\(([^)] *)\\\\)[\\ t] * $INT主(INT ARGC,字符** argv的)
{
    FILE * FP;
    焦线[1024];
    INT RETVAL = 0;
    regex_t重;
    regmatch_t RM [2];
    //这个文件有此行DAEMONS =(网络SYSKLOGD sshd的!netfs!crond的)
    为const char *文件名=/etc/rc.conf中;    如果(argc个大于1)
        文件名=的argv [1];    如果(regcomp(安培;!重,找到相当,REG_EXTENDED)= 0)
    {
        fprintf中(标准错误,无法编译正则表达式'%s'的\\ n,找到相当);
        返回EXIT_FAILURE;
    }    FP = FOPEN(文件名,R);
    如果(FP == 0)
    {
        fprintf中(标准错误,无法打开文件%s(%D:%S)\\ n,文件名,错误号,字符串错误(错误));
        返回EXIT_FAILURE;
    }    而((与fgets(行,1024,FP))!= NULL)
    {
        行[strlen的(线)-1] ='\\ 0';
        如果((RETVAL = regexec(安培;再次,线,2,RM,0))== 0)
        {
            的printf(<<%S>> \\ N,线);
            的printf(行:<<%* S&G​​T;> \\ N,(INT)(RM [0] .rm_eo - RM [0] .rm_so),行+ RM [0] .rm_so);
            的printf(文字:​​<<%* S&G​​T;> \\ N,(INT)(RM [1] .rm_eo - RM [1] .rm_so),行+ RM [1] .rm_so);
            字符* SRC =行+ RM [1] .rm_so;
            字符*结束=行+ RM [1] .rm_eo;
            而(SRC<结束)
            {
                为size_t的len = strcspn(SRC,);
                如果(SRC + LEN>结束)
                    LEN =结束 - SRC;
                的printf(姓名:<<%* S&G​​T;> \\ N,(INT)LEN,SRC);
                SRC + = LEN;
                SRC + = strspn(SRC,);
            }
        }
    }
    返回EXIT_SUCCESS;
}

本旨在查找文件中启动的特定行 DAEMONS = 的/etc/rc.conf 。你可以把它适应你的目的很轻松了。

I am using regular expressions in C (using the "regex.h" library). After setting up the standard calls (and checks) for regcomp(...) and regexec(...), I can only manage to print the actual substrings that match my compiled regular expression. Using regexec, according to the manual pages, means you store the substring matches in a structure known as "regmatch_t". The struct only contains rm_so and rm_eo to reference what I understand to be the addresses of the characters of the matched substring in memory, but my question is how can I just use these to offsets and two pointers to extract the actual substring and store it into an array (ideally a 2D array of strings)?

It works when you just print to standard out, but whenever you try to use the same setup but store it in a string/character array, it stores the entire string that was originally used to match against the expression. Further, what is the "%.*s" inside the print statement? I imagine it's a regular expression in of itself to read in the pointers to a character array correctly. I just want to store the matched substrings inside a collection so I can work with them elsewhere in my software.

Background: p and p2 are both pointers set to point to the start of string to match before entering the while loop in the code below: [EDIT: "matches" is a 2D array meant to ultimately store the substring matches and was preallocated/initalized before the main loop you see below]

int ind = 0;
while(1){
    regExErr1 = regexec(&r, p, 10, m, 0);
    //printf("Did match regular expr, value %i\n", regExErr1);
    if( regExErr1 != 0 ){ 
        fprintf(stderr, "No more matches with the inherent regular expression!\n"); 
        break; 
    }   
    printf("What was found was: ");
    int i = 0;
    while(1){
        if(m[i].rm_so == -1){
            break;
        }
        int start = m[i].rm_so + (p - p2);
        int finish = m[i].rm_eo + (p - p2);
        strcpy(matches[ind], ("%.*s\n", (finish - start), p2 + start));
        printf("Storing:  %.*s", matches[ind]);
        ind++;
        printf("%.*s\n", (finish - start), p2 + start);
        i++;
    }
    p += m[0].rm_eo; // this will move the pointer p to the end of last matched pattern and on to the start of a new one
}
printf("We have in [0]:  %s\n", temp);

解决方案

There are quite a lot of regular expression packages, but yours seems to match the one in POSIX: regcomp() etc.

The two structures it defines in <regex.h> are:

  • regex_t containing at least size_t re_nsub, the number of parenthesized subexpressions.

  • regmatch_t containing at least regoff_t rm_so, the byte offset from start of string to start of substring, and regoff_t rm_eo, the byte offset from start of string of the first character after the end of substring.

Note that 'offsets' are not pointers but indexes into the character array.

The execution function is:

  • int regexec(const regex_t *restrict preg, const char *restrict string, size_t nmatch, regmatch_t pmatch[restrict], int eflags);

Your printing code should be:

for (int i = 0; i < r.re_nsub; i++)
{
    int start = m[i].rm_so;
    int finish = m[i].rm_eo;
    strcpy(matches[ind], ("%.*s\n", (finish - start), p + start));
    printf("Storing:  %.*s\n", (finish - start), matches[ind]);
    ind++;
    printf("%.*s\n", (finish - start), p + start);
}

Note that this code should be upgraded to ensure that the string copy does not overflow the target string. It is also a good idea to mark the start and end of a string, for example like:

    printf("<<%.*s>>\n", (finish - start), p + start);

This makes it a whole heap easier to see spaces etc.

[In future, please attempt to provide an SSCCE (Short, Self-Contained, Correct Example) so that people can help more easily.]

This is an SSCCE that I created, probably in response to another SO question in 2010. It is one of a number of programs I keep that I call 'vignettes'; little programs that show the essence of some feature (such as POSIX regexes, in this case). I find them useful as memory joggers.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <regex.h>

#define tofind    "^DAEMONS=\\(([^)]*)\\)[ \t]*$"

int main(int argc, char **argv)
{
    FILE *fp;
    char line[1024];
    int retval = 0;
    regex_t re;
    regmatch_t rm[2];
    //this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
    const char *filename = "/etc/rc.conf";

    if (argc > 1)
        filename = argv[1];

    if (regcomp(&re, tofind, REG_EXTENDED) != 0)
    {
        fprintf(stderr, "Failed to compile regex '%s'\n", tofind);
        return EXIT_FAILURE;
    }

    fp = fopen(filename, "r");
    if (fp == 0)
    {
        fprintf(stderr, "Failed to open file %s (%d: %s)\n", filename, errno, strerror(errno));
        return EXIT_FAILURE;
    }

    while ((fgets(line, 1024, fp)) != NULL)
    {
        line[strlen(line)-1] = '\0';
        if ((retval = regexec(&re, line, 2, rm, 0)) == 0)
        {
            printf("<<%s>>\n", line);
            printf("Line: <<%.*s>>\n", (int)(rm[0].rm_eo - rm[0].rm_so), line + rm[0].rm_so);
            printf("Text: <<%.*s>>\n", (int)(rm[1].rm_eo - rm[1].rm_so), line + rm[1].rm_so);
            char *src = line + rm[1].rm_so;
            char *end = line + rm[1].rm_eo;
            while (src < end)
            {
                size_t len = strcspn(src, " ");
                if (src + len > end)
                    len = end - src;
                printf("Name: <<%.*s>>\n", (int)len, src);
                src += len;
                src += strspn(src, " ");
            }
        }
    } 
    return EXIT_SUCCESS;
}

This was designed to find a particular line starting DAEMONS= in a file /etc/rc.conf. You can adapt it to your purposes easily enough.

这篇关于ç普通防爆pressions:提取的实际匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆