Flex如何区分A,AB和ABC? [英] How does Flex distinguish between A, AB, and ABC?

查看:185
本文介绍了Flex如何区分A,AB和ABC?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Flex进行了此实验,以查看是否输入ABC,是否将看到所有A,AB,ABC或仅看到ABC或仅看到表达式列表中的第一个匹配项.

I made this experiment for Flex to see if I enter ABC, if it will see all A, AB, ABC or only ABC or only the first match in the list of expressions.

%{
#include <stdio.h>
%}

%%
A puts("got A");
AB puts("got AB");
ABC puts("got ABC");
%%

int main(int argc, char **argv)
{
    yylex();

    return 0;
}

当我在编译并运行程序后进入ABC时,它以"Got ABC"响应,这让我感到非常惊讶,因为我认为lex不会跟踪访问的文本,只会找到第一个匹配项.但实际上,它似乎找到了最长的匹配项.

When I enter ABC after compiling and running the program, it responds with "Got ABC" which really surprises me since I thought lex doesn't keep track of visited text, and only finds the first match; but actually, it seems to find the longest match.

当且仅当不再匹配时,Flex会使用什么策略来响应A?

What strategy does Flex use to respond to A if and only if there is no longer match?

推荐答案

(F)lex使用

The fact that (F)lex uses the maximal-munch principle should hardly be surprising, since it is well documented in the Flex manual:

运行生成的扫描程序时,它将分析其输入以查找与任何模式匹配的字符串.如果找到多个匹配项,则采用匹配最多的text….如果找到两个或两个以上相同长度的匹配项,则选择在弹性输入文件中最先列出的规则. (如何匹配输入"部分的第一段)

When the generated scanner is run, it analyzes its input looking for strings which match any of its patterns. If it finds more than one match, it takes the one matching the most text…. If it finds two or more matches of the same length, the rule listed first in the flex input file is chosen. (First paragraph of the section "How the input is matched")

精确的算法极其简单:每次请求令牌时,flex都会扫描文本,并在DFA中移动.每次达到接受状态时,它都会记录当前文本位置.如果无法再进行转换,它将返回到上一个记录的接受位置,并成为令牌的结尾.

The precise algorithm is exceedingly simple: every time a token is requested, flex scans the text, moving through the DFA. Every time it hits an accepting state, it records the current text position. When no more transitions are possible, it returns to the last recorded accept position, and that becomes the end of the token.

结果是(F)lex可以多次扫描相同的文本,尽管每个令牌仅扫描一次.

The consequence is that (F)lex can scan the same text multiple times, although it only scans once for each token.

一组需要大量回溯的词汇规则将减慢词汇扫描的速度. Flex手册的性能注意事项中对此进行了讨论避免问题的策略.但是,除了病理情况外,回溯的开销并不明显.

A set of lexical rules which require excessive back-tracking will slow down the lexical scan. This is discussed in the Flex manual section Performance Considerations, along with some strategies to avoid the issue. However, except in pathological cases, the overhead from back-tracking is not noticeable.

这篇关于Flex如何区分A,AB和ABC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆