在正则表达式ç奇怪的空白字符行为 [英] Weird blank character behaviour in regex C

查看:133
本文介绍了在正则表达式ç奇怪的空白字符行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C.使用正则表达式的问题
我想收集命令(GET,PUT或DEL)和文件路径,发出正确的指令到服务器。

如果我只编译'[[:空白:]] *(GET | PUT | DEL | HELP)中,code的作品和我收集的正确的事。然而,当我添加了一些前任pression,如:'[[:空白:]] *(GET | PUT | DEL | HELP)[[:空白:]] +( [AZ])中,regexec返回REG_NOMATCH。

你有一个解决方案,还是你知道为什么吗?

这是我的code:

 的#include< regex.h>
#包括dgb.h
#包括LT&;&stdio.h中GT;
#包括LT&;&stdlib.h中GT;
#包括LT&; SYS / types.h中>
#包括LT&;&stdio_ext.h GT;DEFINE MODE客户INT主(INT ARGC,CHAR *的argv []){    regex_t $ P $皮克;
    为const char * str_regex ​​=[[:空白:]] *(GET | PUT | DEL | HELP)[[:空白:]] +([A-Z]);
    烧焦str_request [51];
    INT reg_init;
    INT reg_request;
    为size_t nmatch = 0;
    regmatch_t * pmatch = NULL;    reg_init = regcomp(安培; $ P $皮克,str_regex,REG_ICASE);    如果(reg_init!= 0){
        的printf(错误\\ n);
        出口(EXIT_FAILURE);
    }    nmatch = preg.re_nsub;
    pmatch =的malloc(nmatch * sizeof的(* pmatch));
    checkmem(pmatch);    而(STRCMP(str_request,跳槽)!= 0){        的printf(>>中);
        scanf函数(%50年代,str_request);
        __fpurge(标准输入); // fpurge在OSX        reg_request = regexec(安培; $ P $皮克,str_request,nmatch,pmatch,0);        如果(reg_request == REG_NOMATCH){
            的printf(%S:无效的命令,请点击帮助\\ n模式);
        }        否则,如果(reg_request == 0){            字符* CMD = NULL;
            INT开始= pmatch [0] .rm_so;
            INT结束= pmatch [0] .rm_eo;
            为size_t大小=结束 - 启动;            CMD =的malloc(sizeof的(字符*)*(尺寸+ 1));
            函数strncpy(CMD,和放大器; str_request [开始],大小);
            CMD [大小] ='\\ 0';
            的printf(%S \\ n,CMD);
            免费(CMD);
         }
    }    免费(pmatch);
}


解决方案

有这里有两个问题:


  1. 格式字符串%S scanf函数提取的非空白字符的字符串,并停止在第一空白字符找到。当你输入得到的东西,只 GET 是由 scanf函数行。

      scanf函数(%50年代,str_request);

    一个选项是改变code使用与fgets 来读取输入的整行。请注意新行字符包含在缓冲区,所以你必须进行相应的处理它。


  2. 您正在编写的扩展正防爆pression(ERE)语法的正则表达式,因为你正在使用交替 | ,分组,一个或多个量词 +

    在基本正防爆pression(BRE), | + 不可用,而括号必须转义 \\( \\)来调用它的特殊含义。

    因此​​, REG_EXTENDED 标志是必要的,让您的正则表达式按预期工作。


参考

I have a problem using regex in C. I want to collect a command (GET, PUT or DEL) and a filepath, to send the right command to a server.

If I compile only ' [[:blank:]]*(GET|PUT|DEL|HELP) ', the code works and I collect the right thing. However, when I add something to the expression, such as : '[[:blank:]]*(GET|PUT|DEL|HELP)[[:blank:]]+([a-z])', the regexec returns REG_NOMATCH.

Do you have a solution or do you know why?

This is my code:

#include <regex.h>
#include "dgb.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <stdio_ext.h>

DEFINE MODE "client"

int main(int argc, char *argv[]) {

    regex_t preg;
    const char *str_regex = "[[:blank:]]*(GET|PUT|DEL|HELP)[[:blank:]]+([a-z])";
    char str_request[51];
    int reg_init;
    int reg_request;
    size_t nmatch = 0;
    regmatch_t *pmatch = NULL;       

    reg_init = regcomp(&preg, str_regex, REG_ICASE);

    if (reg_init != 0) {
        printf("Error\n");
        exit(EXIT_FAILURE);
    }

    nmatch = preg.re_nsub;
    pmatch = malloc(nmatch * sizeof(*pmatch));
    checkmem(pmatch);

    while(strcmp(str_request,"quit") != 0) {

        printf(">>");
        scanf("%50s", str_request);
        __fpurge(stdin); //fpurge on OSX

        reg_request = regexec(&preg, str_request, nmatch, pmatch, 0);

        if (reg_request == REG_NOMATCH) {
            printf("%s: Invalid command, please tap help\n", MODE);
        }

        else if (reg_request == 0) {

            char *cmd = NULL;
            int start = pmatch[0].rm_so;
            int end = pmatch[0].rm_eo;
            size_t size = end - start;

            cmd = malloc (sizeof (char*) * (size + 1));
            strncpy(cmd, &str_request[start], size);
            cmd[size] = '\0';
            printf ("%s\n", cmd);


            free(cmd);    
         }   
    }

    free(pmatch);        
}

解决方案

There are two problems here:

  1. Format string %s in scanf extracts a string of non-whitespace characters and stops at the first whitespace character found. When you input GET something, only GET is read by the scanf line.

    scanf("%50s", str_request);
    

    One option is to change the code to use fgets to read the whole line of input. Do note that the new line character is included in the buffer, so you have to deal with it accordingly.

  2. You are writing your regex in Extended Regular Expression (ERE) syntax, since you are using alternation |, grouping ( and ), one or more quantifier +.

    In Basic Regular Expression (BRE), | and + is not available, and the parentheses must be escaped \( \) to invoke its special meaning.

    Therefore, the REG_EXTENDED flag is necessary to make your regex works as intended.

Reference

这篇关于在正则表达式ç奇怪的空白字符行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆