在解析器组合器中组合词法分析器和分析器 [英] Combining lexer and parser in a parser combinator

查看:103
本文介绍了在解析器组合器中组合词法分析器和分析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 uu-parsinglib ,但我认为以下问题是解析器combinator generic。



让我们考虑下面的例子:

我有一个使用combinator pLex 的词法分析器,它会生成一个列表令牌(类型 MyToken )。我现在想编写一个解析器,它将消耗这些标记并构建一个 AST



什么是最好的连接词法分析器和解析器的方法?现在我有一个 lex 函数:

  lex s = parse( (,)< $> pLex * pEnd)(createStr(LineColPos 0 0 0)s)

我应该创建一个函数 parse p = ... ?如果是的话,我该如何构建它来跟踪词法分析器的列和行?或者我应该创建一个 parserCombinator ,它会以某种方式使用 pLex 组合器?


<基于表的解析器需要分离词法分析和解析,因为它们的预见能力有限。展望未来,将词法分析结合到解析器中将会爆炸状态空间。

基于组合器的方法通常不会遇到这个问题,因为它们通常在执行递归 - 下降解析。除非图书馆作者另有说明,否则合并阶段并没有什么坏处,分开它们并没有多大好处。



尽管uu-parsinglib提供了 Str 类来对不同的字符串类型的输入进行抽象,看它的定义表明它仍然假定你最终读取一个Char序列,无论它们来自String,ByteString,Text,等等。所以试图让它解析一个MyToken流似乎可能是困难的。如果你觉得你需要这样做,Parsec可能是一个更好的选择。

至于你关于字符串实现的问题,combinators接受一个包含语法结构的字符串式输入,返回相应的语义值,如果它们匹配的话。在组合器内部,通过组合来自您所调用的子组合器的语义值,您可以从输入流中直接解析的内容构建语义值。

因此,在您的示例中,您的'字符串匹配'组合器将在其作用域中包含一个令牌列表,这要归功于它的解析。您可以使用Haskell的全部功能将这些标记合并到一个MyString值中,以任何方式适合您的语言:也许是一个表示将要切入哪些值的SplicedString类型。



字符串组合器可能由一个表达式组合器调用,该组合器将能够将MyString值与其他解析值组合为MyExpression值。它是组合者一路返回语义值!


I'm using uu-parsinglib, but I think the following question is parser combinator generic.

Let's consider the following example:

I've got a lexer with a combinator pLex, which produces a list of tokens (of type MyToken). I now want to write a parser, which will consume the tokens and build an AST.

What is the best way to connect the lexer and parser? Right now I have a lex function:

lex s = parse ( (,) <$> pLex <*> pEnd) (createStr (LineColPos 0 0 0) s)

Should I create a function parse p = ...? If yes, how do I construct it to keep track of columns and lines from lexer? Or should I create a parserCombinator, which would use the pLex combinator somehow?

解决方案

Table-based parsers require separation of lexical analysis and parsing because of their limited lookahead capability. Looking ahead far enough to combine lexical analysis into the parser would explode the state space.

Combinator-based approaches do not usually suffer this problem, as they are typically doing recursive-descent parsing. Unless otherwise noted by the library author, there is no harm in combining the phases and not much to gain by separating them.

Although uu-parsinglib provides the Str class to abstract over different string-like inputs, looking at its definition shows that it still assumes that you are ultimately reading a sequence of Char, whether they be from a String, ByteString, Text, etc. So trying to get it to parse a MyToken stream seems like it could be difficult. Parsec might be a better choice if you feel you need to do that.

As to your question about your string implementation, combinators take a string-like input containing syntactic structure and return the corresponding semantic value, if they match. Inside the combinator, you get to build that semantic value from what you parse directly by taking from the input stream and by combining the semantic values from sub-combinators you call.

So, your 'String matching' combinator in your example will have a list of tokens in its scope thanks to the parsing it did. You can use the full power of Haskell to combine those tokens into a single MyString value in whatever way makes sense for your language: Maybe a 'SplicedString' type that represents what values are to be sliced into it.

The string combinator was probably called by an 'expression' combinator, which will be able to combine the MyString value with other parsed values into a MyExpression value. It's combinators returning semantic values all the way back up!

这篇关于在解析器组合器中组合词法分析器和分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆