Haskell解析技术最适合使用,为什么? [英] Which Haskell parsing technology is most pleasant to use, and why?

查看:130
本文介绍了Haskell解析技术最适合使用,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如:你可以用一种自然的方式编写语法,而不必以一种复杂的方式重写它们,而不必介绍枯燥的样板。



为了这个问题的目的,我们要规定,除非一项技术的表现在病态上不好,否则表现并不是这里最大的问题。



说到这一点,你可能想提到一个技术是否因性能原因而必须重写语法。



请给我一个关于尺寸的想法以及您在回答这个问题时所使用的语法的复杂性。此外,您是否使用过该技术的任何显着的高级功能,以及您对这些技术的印象如何。当然,这个问题的答案可能会取决于域名,在这种情况下,我很乐意了解这一事实。

解决方案

和你想做什么。如果有一个LR语法(例如,你正在从一个Yacc语法开始工作),那么转换是一个很好的工作它变成一个适用于Parsec或uu-parsinglib的LL。然而,许多sepBy等解析器在这里非常有帮助,但您应该期望解析器比Happy + Alex慢。



对于LL组合器解析,uu-parsinglib它的前身uu-parsing很不错,但它们缺少像Parsec的Token和Language模块,所以可能不太方便。有些人喜欢Malcolm Wallace的Parselib,因为他们与Parsec有不同的模型用于回溯,但我没有经验。



如果您正在解码某些格式化文件而不是某种东西像编程语言一样,Attoparsec或类似的可能比Parsec或uu-parsinglib更好。在这种情况下更好的是速度更快 - 不仅仅是ByteString与Char,但我认为Attoparsec在错误处理/源位置跟踪方面的工作较少,所以解析器应该运行得更快,因为他们每个输入元素的工作量较少。

另外,请记住,文本文件格式可能并不总是具有语法,因此您可能必须定义一些自定义组合器来执行特殊的词法技巧,而不是仅仅定义解析器组合器用于每个元素。

对于LR解析,我发现Ralf Hinze的Frown比Happy更好 - 更好的错误支持和更好的语法文件格式,但是Frown没有被主动维护,在Hackage上。我认为它是LR(k)而不是LR(1),这意味着它更强大w.r.t.展望。

表现并不是真正令人担忧的事情。一个语法。编程语言有复杂的语法,但你可以期望相当小的文件。至于数据文件格式,格式设计者真的应该以这样的方式来设计它,以便高效地解析。对于combinator解析器,你不应该为数据格式文件需要许多高级功能 - 如果你这样做,或者格式设计的很糟糕(这有时会不幸发生),或者你的解析器是。

为了记录,我写了一个带有Frown的C语法分析器,带有Happy的GL-shading语言,带有UU_Parsing的未完成的C语法分析器以及Parsec中的许多事情。对于我来说,我选择的是LR语法--Frown或Happy(现在不用维护Frown),否则通常是Parsec(正如我所说的uu_parse很好,但缺乏LanguageDef的便利)。对于二进制格式我自己推出,但我通常有特殊要求。


"Pleasant" meaning, for example: you can write grammars in a "natural" way without having to rewrite them in a convoluted way, and without having to introduce boring boilerplate.

Let's stipulate for the purposes of this question that, unless the performance of a technology is pathologically bad, performance isn't the biggest issue here.

Although, having said that, you might want to mention if a technology falls down when it comes to having to rewrite a grammar for performance reasons.

Please give me an idea of the size and complexity of grammars you have worked with, when answering this question. Also, whether you have used any notable "advanced" features of the technology in question, and what your impressions of those were.

Of course, the answer to this question may depend on the domain, in which case, I'd be happy to learn this fact.

解决方案

It really depends what you start with and what you want to do. There isn't a one size fits all.

If have an LR grammar (e.g. you are working from a Yacc grammar), it is a good deal of work to turn it into an LL one suitable for Parsec or uu-parsinglib. However the many, sepBy etc. parsers are very helpful here, but you should expect the parser to be slower than Happy+Alex.

For LL combinator parsing, uu-parsinglib and it predecessor uu-parsing are nice but they are lacking something like Parsec's Token and Language modules so are perhaps less convenient. Some people like Malcolm Wallace's Parselib because they have a different model to Parsec for backtracking but I've no experience of them.

If you are decoding some formatted file rather than something like a programming language, Attoparsec or similar might be better than Parsec or uu-parsinglib. Better in this context being faster - not just ByteString vs. Char, but I think Attoparsec does less work regarding error handling / source location tracking so the parsers should run faster as they are doing less work per input element.

Also, bear in mind that text file formats might not always have grammars as such, so you might have to define some custom combinators to do special lexical tricks rather than just define "parser combinators" for each element.

For LR parsing, I found Ralf Hinze's Frown to be nicer than Happy - better error support and a nicer format for grammar files but Frown is not actively maintained and isn't on Hackage. I think it is LR(k) rather LR(1) which means it is more powerful w.r.t. lookahead.

Performance is not really a big concern w.r.t. a grammar. Programming languages have complex grammars, but you can expect fairly small files. As for data file formats it really behoves the designer of the format to design it in such a way that it allows efficient parsing. For combinator parsers you shouldn't need many advanced features for a data format file - if you do, either the format is badly designed (this sometimes happens unfortunately) or your parser is.

For the record I've written a C parser with Frown, GL-shading language with Happy, an unfinished C parser with UU_Parsing, and many things with Parsec. The choice for me was what I start with, LR grammar - Frown or Happy (now Happy as Frown isn't maintained), otherwise usually Parsec (as I said uu_parse is nice but lacks the convenience of LanguageDef). For binary formats I roll my own, but I usually have special requirements.

这篇关于Haskell解析技术最适合使用,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆