在 Text.Parsec.Token 标记器中保留注释 [英] Preserving comments in `Text.Parsec.Token` tokenizers
问题描述
我正在使用 parsec 编写源到源转换,所以我有一个LanguageDef
用于我的语言,我使用 Text.Parsec.Token.makeTokenParser
为它构建了一个 TokenParser
:
myLanguage = LanguageDef { ...评论开始 = "/*", commentEnd = "*/"...}-- 定义 'stringLiteral'、'identifier' 等...TokenParser {..} = makeTokenParser myLanguage
不幸的是,由于我定义了commentStart
和commentEnd
,TokenParser
中的每个解析器组合子都是一个词素根据whiteSpace
实现的解析器,whiteSpace
吃空格和注释.
在这种情况下保留评论的正确方法是什么?
我能想到的方法:
- 不要定义
commentStart
和commentEnd
.将每个词素解析器包装在另一个组合器中,该组合器在解析每个标记之前获取注释. - 实现我自己的
makeTokenParser
版本(或者使用一些泛化Text.Parsec.Token
的库;如果是,是哪个库?)
在这种情况下做了什么?
原则上,定义 commentStart 和 commentEnd 不适合保留注释,因为您需要将注释视为源语言和目标语言的有效部分,包括它们在你的语法和你的 AST/ADT 中.
通过这种方式,您可以将评论的文本保留为 Comment 构造函数的有效负载数据,并以目标语言适当地输出它,例如
data Statement = Comment String |返回表达式 |......
源语言和目标语言都不认为注释文本相关这一事实与您的翻译代码无关.
<小时>这种方法的主要问题:它不太适合 makeTokenParser
,但更适合从头开始实现源语言的解析器.
我想我正在转向编辑 makeTokenParser
来让注释解析器返回 String
而不是 ()
.>
I'm writing a source-to-source transformation using parsec, So I have a LanguageDef
for my language and I build a TokenParser
for it using Text.Parsec.Token.makeTokenParser
:
myLanguage = LanguageDef { ...
commentStart = "/*"
, commentEnd = "*/"
...
}
-- defines 'stringLiteral', 'identifier', etc...
TokenParser {..} = makeTokenParser myLanguage
Unfortunately since I defined commentStart
and commentEnd
, each of the parser combinators in the TokenParser
is a lexeme parser implemented in terms of whiteSpace
, and whiteSpace
eats spaces as well as comments.
What is the right way to preserve comments in this situation?
Approaches I can think of:
- Don't define
commentStart
andcommentEnd
. Wrap each of the lexeme parsers in another combinator that grabs comments before parsing each token. - Implement my own version of
makeTokenParser
(or perhaps use some library that generalizesText.Parsec.Token
; if so, which library?)
What's the done thing in this situation?
In principle, defining commentStart and commentEnd don't fit with preserving comments, because you need to consider comments as valid parts of both source and target language, including them in your grammar and your AST/ADT.
In this way, you'd be able to keep the text of the comment as the payload data of a Comment constructor, and output it appropriately in the target language, something like
data Statement = Comment String | Return Expression | ......
The fact that neither source nor target language sees the comment text as relevant is irrelevant for your translation code.
Major problem with this approach: It doesn't really fit well with makeTokenParser
, and fits better with implementing your source language's parser from the ground up.
I guess I'm veering towards editing makeTokenParser
to just get the comment parsers to return the String
instead of ()
.
这篇关于在 Text.Parsec.Token 标记器中保留注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!