如何实施最大咀嚼? [英] How is maximal-munch implemented?

查看:111
本文介绍了如何实施最大咀嚼?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究编译器,并且正在学习词法分析.我知道可以将每个词素指定为正则表达式,并且使用flex可以自动生成词法分析器.我正在进一步学习如何将正则表达式转换为NFA,然后再将其转换为DFA,并在其中进行快速评估.

I am studying compilers and am learning about lexical analysis. I understand that one specifies each lexeme as a regular expression, and using flex, a lexer can be automatically generated. I am further learning about how the regular expression is converted to an NFA which is then converted to a DFA, where it can be quickly evaluated.

但是,我的问题是,最大咀嚼规则是如何实施的?在内部,词法分析器如何知道继续前进"以找到最长的词素?

However, my question is, how is the maximal-munch rule implemented? Internally, how does the lexer know to "keep going" to find the longest possible lexeme?

谢谢!

推荐答案

通过在DFA执行器中添加少量可变状态,并添加DFA执行器回滚"输入的功能,可以实现最大的munch算法. :实际上,为它提供了tell()seek()之类的功能.

The maximal munch algorithm is implemented by adding a small amount of mutable state to the DFA executor, and adding the ability of the DFA executor to "rewind" the input: in effect, providing it with functions like tell() and seek().

从过渡功能不完整的意义上来说,还值得注意的是DFA不完整.某些{state, input}对没有定义的结果. [注2]

It's also worth noting that the DFA is not complete, in the sense that the transition function is not complete. Some {state, input} pairs do not have a defined result. [Note 2]

请记住,算法如下:

Set Accepted NFA State to ⊥
Set Accepted Position to Tell(Input Stream)
Set State to Starting State
Repeat:
  If State ∈ Accepting:
    Set Accepted NFA State to Accepting NFA State for State  [Note 1]
    Set Accepted Position to Tell(Input Stream)
  Read one symbol from Input Stream into Next Symbol
  If there is a transition from {State, Next Symbol} to New State:
    Set State to New State
    Continue the loop
  Otherwise:
    Rewind Input Stream to Accepted Position
    Return Accepted NFA State

如果算法返回&bottom ;,则没有识别出令牌,输入流将退回到初始位置.

If the algorithm returns ⊥, then no token was recognized and the input stream will have been rewound to the initial position.

注意:

  1. NFA通常在状态和接受动作之间具有明确的同态性,但是DFA构造算法可以将两个具有不同动作的接受NFA状态组合在一起.在这种情况下,flex算法将优先考虑输入文件中的第一个动作.在上述算法中,我们通过将每个接受DFA状态映射到具有优先级的接受NFA状态的组件来表示这一点.

  1. The NFA typically has an unambiguous homomorphism between states and accepting actions, but the DFA construction algorithm may combine two accepting NFA states with different actions. In this case, the flex algorithm is to give priority to the first action in the input file. In the above algorithm, we represent this by mapping each accepting DFA state to the component accepting NFA state which has priority.

通过添加额外的(且唯一的)sink状态(不接受且仅具有自身的过渡)来轻松完成DFA.然后,我们可以将sink状态添加为任何其他未指定转换的转换.如果我们将sink状态称为⊥这样就很清楚如何修改所提供的算法;在实践中,这根本没有必要,因为在实践中,我们并不关心DFA是不完整的.但是,它确实对状态最小化算法产生了一些影响.

It's easy to make the DFA complete by adding an additional (and unique) sink state which is non-accepting and which only has transitions to itself. Then we can add the sink state as a transition for any otherwise unspecified transition. If we call the sink state ⊥ then it will be clear how to modify the algorithm provided; in practice, this isn't at all necessary because in practice we don't care that the DFA is incomplete. It does have some impact on state minimization algorithms, though.

这篇关于如何实施最大咀嚼?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆