在 ANTLR 中规范化标记文本 [英] Canonicalizing token text in ANTLR
问题描述
ANTLR 有没有办法将某些标记标记为具有规范输出?
Is there a way in ANTLR to mark certain tokens as having canonical output?
例如,给定语法(摘录)
For example, given the grammar (excerpt)
words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);
words
将匹配FOO BAR BAZ"、foo bar baz"、Foo bAr baZ"等
words
will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.
当我调用 TokenStream#getText(Context)
时,它将返回连接在一起的令牌的实际文本.
When I call TokenStream#getText(Context)
, it'll return the tokens' actual text concatenated together.
有没有办法规范化"这个输出,无论输入是什么,所有 FOO
标记都呈现为Foo",BAR
标记呈现为Bar",并且 BAZ
标记呈现为Baz"(例如)?
Is there a way to "canonicalize" this output such that no matter what the input, all FOO
tokens render as "Foo", BAR
tokens render as "Bar", and BAZ
tokens render as "Baz" (for example)?
给定以上任何输入,我想要输出Foo Bar Baz".
Given any of the inputs above, I'd like to have the output "Foo Bar Baz".
推荐答案
以下任一选项都可以:
实现您自己的方法来获取解析树或标记范围的文本,并将某些已知标记类型的处理放在那里.
Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.
创建您自己的 Token
类,该类知道返回某些令牌的规范形式,并创建一个 TokenFactory
实现来创建该类型的令牌.然后使用 setTokenFactory
方法使您的词法分析器生成这些标记.
Create your own Token
class that knows to return the canonical form of certain tokens, and create a TokenFactory
implementation that creates tokens of that type. Then use the setTokenFactory
method to cause your lexer to produce those tokens.
创建您自己的 TokenStream
实现来覆盖默认行为.
Create your own TokenStream
implementation that overrides the default behavior.
在创建令牌之前运行的操作中明确指定文本:
Explicitly specify the text in an action that runs prior to the creation of tokens:
FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };
其他选项也可能可用.
这篇关于在 ANTLR 中规范化标记文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!