在 Text.Parsec.Token 标记器中保留注释 [英] Preserving comments in `Text.Parsec.Token` tokenizers

查看:76
本文介绍了在 Text.Parsec.Token 标记器中保留注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 parsec 编写源到源转换,所以我有一个LanguageDef 用于我的语言,我使用 Text.Parsec.Token.makeTokenParser 为它构建了一个 TokenParser:

myLanguage = LanguageDef { ...评论开始 = "/*", commentEnd = "*/"...}-- 定义 'stringLiteral'、'identifier' 等...TokenParser {..} = makeTokenParser myLanguage

不幸的是,由于我定义了commentStartcommentEndTokenParser 中的每个解析器组合子都是一个词素根据whiteSpace 实现的解析器,whiteSpace 吃空格和注释.

在这种情况下保留评论的正确方法是什么?

我能想到的方法:

  1. 不要定义commentStartcommentEnd.将每个词素解析器包装在另一个组合器中,该组合器在解析每个标记之前获取注释.
  2. 实现我自己的 makeTokenParser 版本(或者使用一些泛化 Text.Parsec.Token 的库;如果是,是哪个库?)

在这种情况下做了什么?

解决方案

原则上,定义 commentStart 和 commentEnd 不适合保留注释,因为您需要将注释视为源语言和目标语言的有效部分,包括它们在你的语法和你的 AST/ADT 中.

通过这种方式,您可以将评论的文本保留为 Comment 构造函数的有效负载数据,并以目标语言适当地输出它,例如

data Statement = Comment String |返回表达式 |......

源语言和目标语言都不认为注释文本相关这一事实与您的翻译代码无关.

<小时>

这种方法的主要问题:它不太适合 makeTokenParser,但更适合从头开始实现源语言的解析器.

我想我正在转向编辑 makeTokenParser 来让注释解析器返回 String 而不是 ().>

I'm writing a source-to-source transformation using parsec, So I have a LanguageDef for my language and I build a TokenParser for it using Text.Parsec.Token.makeTokenParser:

myLanguage = LanguageDef { ...
  commentStart = "/*"
  , commentEnd = "*/"
  ...
}

-- defines 'stringLiteral', 'identifier', etc...
TokenParser {..} = makeTokenParser myLanguage

Unfortunately since I defined commentStart and commentEnd, each of the parser combinators in the TokenParser is a lexeme parser implemented in terms of whiteSpace, and whiteSpace eats spaces as well as comments.

What is the right way to preserve comments in this situation?

Approaches I can think of:

  1. Don't define commentStart and commentEnd. Wrap each of the lexeme parsers in another combinator that grabs comments before parsing each token.
  2. Implement my own version of makeTokenParser (or perhaps use some library that generalizes Text.Parsec.Token; if so, which library?)

What's the done thing in this situation?

解决方案

In principle, defining commentStart and commentEnd don't fit with preserving comments, because you need to consider comments as valid parts of both source and target language, including them in your grammar and your AST/ADT.

In this way, you'd be able to keep the text of the comment as the payload data of a Comment constructor, and output it appropriately in the target language, something like

data Statement = Comment String | Return Expression | ......

The fact that neither source nor target language sees the comment text as relevant is irrelevant for your translation code.


Major problem with this approach: It doesn't really fit well with makeTokenParser, and fits better with implementing your source language's parser from the ground up.

I guess I'm veering towards editing makeTokenParser to just get the comment parsers to return the String instead of ().

这篇关于在 Text.Parsec.Token 标记器中保留注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆