弹性默认规则 [英] Flex default rule

查看:223
本文介绍了弹性默认规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何自定义flex的默认操作.我发现了< *>之类的东西,但是当我运行它时,它说弯曲的扫描仪卡住了"?也是.规则仅添加一条规则,因此也不起作用.我想要的是

How do I customize the default action for flex. I found something like <*> but when I run it it says "flex scanner jammed"? Also the . rule only adds a rule so it does not work either. What I want is

comment               "/*"[^"*/"]*"*/"

%%
{comment}             return 1;
{default}             return 0; 
<<EOF>>               return -1;

是否可以将最长匹配的行为更改为首先匹配?如果是这样,我会做这样的事情

Is it possible to change the behavior of matching longest to match first? If so I would do something like this

default               (.|\n)*

但是因为这几乎总是会提供更长的匹配时间,所以它将隐藏注释规则.

but because this almost always gives a longer match it will hide the comment rule.

编辑

我在手册中找到了{-}运算符,但是本手册中的示例直接给了我无法识别的规则":

I found the {-} operator in the manual, however this example straight from the manual gives me "unrecogized rule":

[a-c] {-} [b-z]

[a-c]{-}[b-z]

推荐答案

flex默认规则匹配单个字符并将其打印在标准输出上.如果您不想执行此操作,请编写一个明确的规则,该规则匹配单个字符并执行其他操作.

The flex default rule matches a single character and prints it on standard output. If you don't want that action, write an explicit rule which matches a single character and does something else.

模式(.|\n)*将整个输入文件作为单个标记进行匹配,因此这是一个非常糟糕的主意.您以为默认值应该是长匹配,但实际上您希望该值尽可能短(但不能为空).

The pattern (.|\n)* matches the entire input file as a single token, so that is a very bad idea. You're thinking that the default should be a long match, but in fact you want that to be as short as possible (but not empty).

默认规则的目的是在输入语言中的任何标记都不匹配时执行某些操作.当使用lex标记语言时,这种情况几乎总是错误的,因为这意味着输入以一个字符开头,而不是该语言的任何有效标记的开头.

The purpose of the default rule is to do something when there is no match for any of the tokens in the input language. When lex is used for tokenizing a language, such a situation is almost always erroneous because it means that the input begins with a character which is not the start of any valid token of the language.

因此,捕获任何字符"规则被编码为错误恢复的一种形式.这个想法是丢弃不良字符(仅一个),并尝试从该字符之后的字符中进行标记化.这只是一个猜测,但这是一个很好的猜测,因为它基于已知的东西:即输入中有一个坏字符.

Thus, a "catch any character" rule is coded as a form of error recovery. The idea is to discard the bad character (just one) and try tokenizing from the character after that one. This is only a guess, but it's a good guess because it's based on what is known: namely that there is one bad character in the input.

恢复规则可能是错误的.例如,假设该语言的令牌都不以@开头,并且程序员希望编写字符串文字"@abc".只是,她忘记了开头"并写了@abc".正确的解决方法是插入缺少的",而不是丢弃@.但这将需要在词法分析器中使用一套更为巧妙的规则.

The recovery rule can be wrong. For instance suppose that no token of the language begins with @, and the programmer wanted to write the string literal "@abc". Only, she forgot the opening " and wrote @abc". The right fix is to insert the missing ", not to discard the @. But that would require a much more clever set of rules in the lexer.

无论如何,通常在丢弃一个坏字符时,您希望针对这种情况发出错误消息,例如在第42行的第3列中跳过无效字符'〜`".

Anyway, usually when discarding a bad character, you want to issue an error message for this case like "skipping invalid character '~` in line 42, column 3".

将lex用于文本过滤时,将不匹配字符复制到标准输出的默认规则/操作非常有用.然后,默认规则带来了正则表达式搜索的语义(与正则表达式匹配相反):想法是在输入中搜索词法分析器的令牌识别状态机的匹配项,同时打印该搜索跳过的所有材料.

The default rule/action of copying the unmatched character to standard output is useful when lex is used for text filtering. The default rule then brings about the semantics of a regex search (as opposed to a regex match): the idea is to search the input for matches of the lexer's token-recognizing state machine, while printing all material that is skipped by that search.

例如,一个仅包含规则的lex规范:

So for instance, a lex specification containing just the rule:

 "foo" { printf("bar"); }

将实现

 sed -e 's/foo/bar/g'

这篇关于弹性默认规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆