Flex/Lex:正则表达式匹配双字符 [英] Flex/Lex: Regular Expression matches double characters

查看:85
本文介绍了Flex/Lex:正则表达式匹配双字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用C ++编写的flex程序,需要完成以下规则:

我希望yytext接受以下内容:
○零或以下字符之一ABCDEFGH

例如-输入:
三角形ABC"指的是三角形ABC".是有效的形状,我希望程序打印有效形状"
三角形AAC"指的是三角形AAC".不是有效的形状,因为它包含双精度A,并且我希望程序在这种情况下不打印任何内容
三角形ABCD"指的是三角形ABCD".不是有效的形状,因为它包含四个字母,在这种情况下,我也希望程序不打印任何内容.

I have a flex program written in C++ that needs to complete the following rules:

I want yytext to accept the following:
○ Zero or one of the following characters ABCDEFGH

For example - input:
"triangle ABC" is a valid shape and I want the program to print "Valid shape"
"triangle AAC" is not a valid shape because it contains a double A and I want the program to print nothing in this case
"triangle ABCD" is not a valid shape because it contains four letters and I want the program to print nothing in this case too.

下面的代码以及到目前为止我尝试过的正则表达式:

The code below and what regular expressions I tried so far:

%{
    /** Methods and Variables initialization **/
   
%}

corner corner" "[A-H]
line line" "[A-H]{2}
triangle triangle" "[A-H]{3}
square rectangle" "[A-H]{4}
poly pentagon" "[A-H]{5}
hexa hexagon" "[A-H]{6}
hepta heptagon" "[A-H]{7}
octa octagon" "[A-H]{8}

/** Below is the rule section -- yytext is the matched string returned to the program **/
%%
{corner} 
{line} |
{triangle} |  
{square}  |
{poly} |
{hexa} |
{hepta} | 
{octa} {   
     printf("Valid shape: %s", yytext);
}
.
%%

int main() {
    yylex();    
    return 0;
}

// yywrap() - wraps the above rule section 
int yywrap(void)
{
   return 1;
}


当前输入:
三角AAC
当前输出:
有效形状:三角形AAC(我们不想要)

当前输入:
三角形AB
当前输出:
有效形状:三角形ABC


The current input:
triangle AAC
The current output:
Valid shape: triangle AAC (We don't want that)

The current input:
triangle AB
The current output:
Valid shape: triangle ABC

推荐答案

这不是您通常会使用(f)lex的问题,因为基本的词法分析是微不足道的(可以通过简单地拆分来完成(f)lex的舒适范围之外,并且详细的错误分析有点超出(f)lex的舒适范围,特别是因为无法将包含相同字符的字符串两次匹配".使用正则表达式.

This is not the sort of problem for which you would typically use (f)lex, since the base lexical analysis is trivial (it could be done by simply splitting the line at the space) and detailed error analysis is a bit outside of (f)lex's comfort zone, specifically because there's no way to match "a string containing the same character twice" using a regular expression.

仍然如此,如由提出的问题所示您的一个同学,可以利用(f)lex来利用扫描程序的排序规则:

Still, as shown by the question asked by one of your classmates, it can be done with (f)lex by taking advantage of the scanner's ordering rules:

  1. 始终使用最长的匹配项.
  2. 如果两个或更多规则符合条件,请选择第一个.

这不能解决重复字符的问题.解决该问题的唯一方法是列举所有可能性,在这种情况下,有八种可能性.比链接问题中提出的方法更简单的方法是 [AH] * A [AH] * A [AH] * | [AH] * B [AH] * B [AH] * | [AH]* C [AH] * C [AH] * ... .

That doesn't get around the question of duplicate characters. The only way to solve that is to enumerate all possibilities, of which there are eight in this case. A simpler way of doing that than that proposed in the linked question is [A-H]*A[A-H]*A[A-H]*|[A-H]*B[A-H]*B[A-H]*|[A-H]*C[A-H]*C[A-H]*....

让我们创建一组有序的规则,如下所示:

That let's you create an ordered set of rules something like this:

  1. 匹配重复字符的行
  2. 匹配包含太多字符的行
  3. 用正确数量的字符匹配行
  4. 其他所有都是错误.(字符太少,无效的形状名称,无效的字母等)

所以这可能包括这一点(省去了两个宏的定义,这很简单但是很乏味):

So that might include this (leaving out the definitions of the two macros, which is straightforward but tedious):

  /* 1. Dups */
[a-z]+\ {dups}$  { err("Duplicate letter"); }
  /* 2. Too long */
{valid}[A-H]+$   { err("Too long"); }
  /* 3. Just right */
{valid}$         { printf("Valid: %s\n", yytext); }
  /* 4. Anything else */
.+               { err("Too short or invalid character"); }
  /* Ignore newlines */
\n               ;

这篇关于Flex/Lex:正则表达式匹配双字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆