是否可以设置优先级规则,以避免"最长最早的"匹配模式呢? [英] Is it possible to set priorities for rules to avoid the "longest-earliest" matching pattern?

查看:284
本文介绍了是否可以设置优先级规则,以避免"最长最早的"匹配模式呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

另一个简单的问题:有没有办法告诉弯曲的preFER,超过匹配一个较长的事情规则匹配短的事情的规则?我无法找到任何好的文档。

Another simple question : is there any way to tell flex to prefer a rule that matches a short thing over a rule that matches a longer thing ? I can't find any good documentation about that.

下面就是为什么我需要的是:我解析文件中包含对应的控制指令一些关键字伪语言。我想他们是绝对的优先级,以便他们不是解析为前pression的部分。我真的需要这个优先的事情,因为我没有写我的项目一个完整的语法(因为我的分析,我并不需要知道的细节程序进行结构分析,这将是在我的情况完全矫枉过正.. 。),所以我不能用罚款的语法调整,以确保那些块不会被解析到一个前pression。

Here is why I need that : I parse a file for a pseudo language that contains some keywords corresponding to control instructions. I'd like them to be the absolute priority so that they're not parsed as parts of an expression. I actually need this priority thing because I don't have to write a full grammar for my project (that would be totally overkill in my case since I perform structural analysis on the program parsed, I don't need to know the details...), so I can't use a fine grammar tuning to be sure that those blocks won't be parsed into an expression.

任何帮助将AP preciated。

Any help will be appreciated.

下面是解析的文件的一个例子:

Here is an example of a file parsed :

If a > 0 Then read(b); Endif
c := "If I were...";
While d > 5 Do d := d + 1 Endwhile

我只是想收集关于IFS的,Thens,Endifs等信息...其余的并不重要,我。这就是为什么我想IFS的,Thens等等相关的规则的优先次序,而不要有写的语法。

I just want to collect info on the Ifs, Thens, Endifs etc... The rest doesn't matter to me. That's why I'd like the Ifs, Thens etc... related rules to be prioritized without to have to write a grammar.

推荐答案

龙书第二版,第3.5.3节Lex中解决冲突:

We have alluded to the two rules that Lex uses to decide on the proper lexeme
to select, when several prefixes of the input match one or more patterns:
    1. Always prefer a longer prefix to a shorter prefix.
    2. If the longest possible prefix matches two or more patterns, prefer the
       pattern listed first in the Lex program.

上面的规则也适用于Flex的。下面是在 Flex的手册说什么(第7章:输入如何匹配)

When the generated scanner is run, it analyzes its input looking for strings 
which match any of its patterns. If it finds more than one match, it takes the 
one matching the most text (for trailing context rules, this includes the length 
of the trailing part, even though it will then be returned to the input). If it 
finds two or more matches of the same length, the rule listed first in the flex 
input file is chosen.

如果我理解正确的,你的词法分析器把关键字,如 ENDIF 作为标识符,所以这将是继视为前pression的一部分。如果这是你的问题,简单地把关键字的规则对您的规格之上,如以下几点:(假设以大写每个单词是predefined枚举对应一个令牌)

If I understood correctly, your lexer treats keywords like Endif as an identifier, so it will be considered as part of an expression afterwards. If this is your problem, simply put the rules of keywords on top of your specification, such as the following: (suppose each word in uppercase is a predefined enum corresponding to a token)

"If"                      { return IF;         }
"Then"                    { return THEN;       }
"Endif"                   { return ENDIF;      }
"While"                   { return WHILE;      }
"Do"                      { return DO;         }
"EndWhile"                { return ENDWHILE;   }
\"(\\.|[^\\"])*\"         { return STRING;     }
[a-zA-Z_][a-zA-Z0-9_]*    { return IDENTIFIER; }

然后关键字将始终标识符匹配之前的由于第2号

编辑:

感谢您的评论,KOL。我忘了添加规则的字符串。 但是我不认为我的解决办法是错误的。例如,如果调用一个标识符 If_this_is_an_identifier ,规则 1 会适用,因此标识符规则将生效(因为它的最长的字符串相匹配)。我写了一个简单的测试案例,并在我的解决方案看不出有什么问题。这里是我的lex.l文件:

Thank you for your comment, kol. I forgot to add the rule for string. But I don't think my solution is wrong. for example, if an identifier called If_this_is_an_identifier, rule 1 will apply, thus the identifier rule will take effect (Since it matches the longest string). I wrote a simple test case and saw no problem in my solution. Here is my lex.l file:

%{
  #include <iostream>
  using namespace std;
%}

ID       [a-zA-Z_][a-zA-Z0-9_]*

%option noyywrap
%%

"If"                      { cout << "IF: " << yytext << endl;         }
"Then"                    { cout << "THEN: " << yytext << endl;       }
"Endif"                   { cout << "ENDIF: " << yytext << endl;      }
"While"                   { cout << "WHILE: " << yytext << endl;      }
"Do"                      { cout << "DO: " << yytext << endl;         }
"EndWhile"                { cout << "ENDWHILE: " << yytext << endl;   }
\"(\\.|[^\\"])*\"         { cout << "STRING: " << yytext << endl;     }
{ID}                      { cout << "IDENTIFIER: " << yytext << endl; }
.                         { cout << "Ignore token: " << yytext << endl; }

%%

int main(int argc, char* argv[]) {
  ++argv, --argc;  /* skip over program name */
  if ( argc > 0 )
    yyin = fopen( argv[0], "r" );
  else
    yyin = stdin;

  yylex();
}

我测试了我的解决方案具有以下测试用例:

I tested my solution with the following test case:

If If_this_is_an_identifier > 0 Then read(b); Endif
    c := "If I were...";
While While_this_is_also_an_identifier > 5 Do d := d + 1 Endwhile

和它给了我下面的输出(你提到被忽略的问题不相关的其他输出。)

and it gives me the following output (other output not relevant to the problem you mentioned is ignored.)

IF: If
IDENTIFIER: If_this_is_an_identifier
......
STRING: "If I were..."
......
WHILE: While
IDENTIFIER: While_this_is_also_an_identifier

的lex.l程序从柔性手册修饰的碱基上的例子:(其中用同样的方法,以关键字匹配出标识符)
<一href=\"http://flex.sourceforge.net/manual/Simple-Examples.html#Simple-Examples\">http://flex.sourceforge.net/manual/Simple-Examples.html#Simple-Examples

也有看的 ANSI C的语法,莱克斯规范
<一href=\"http://www.lysator.liu.se/c/ANSI-C-grammar-l.html\">http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

我也用这个办法在我个人的项目,到目前为止,我没有发现任何问题。

I also used this approach in my personal project, and so far I didn't find any problem.

这篇关于是否可以设置优先级规则,以避免&QUOT;最长最早的&QUOT;匹配模式呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆