在ANTLR中规范化令牌文本 [英] Canonicalizing token text in ANTLR

查看:84
本文介绍了在ANTLR中规范化令牌文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR中是否可以将某些标记标记为具有规范输出?

Is there a way in ANTLR to mark certain tokens as having canonical output?

例如,给定语法(节选)

For example, given the grammar (excerpt)

words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);

words将与"FOO BAR BAZ","foo bar baz","Foo bAr baZ"等匹配.

words will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.

当我调用TokenStream#getText(Context)时,它将返回令牌的实际文本串联在一起.

When I call TokenStream#getText(Context), it'll return the tokens' actual text concatenated together.

是否有一种方法可以规范化"此输出,以便无论输入什么,所有FOO令牌都将呈现为"Foo",BAR令牌将呈现为"Bar",而BAZ令牌将呈现为巴兹"(例如)?

Is there a way to "canonicalize" this output such that no matter what the input, all FOO tokens render as "Foo", BAR tokens render as "Bar", and BAZ tokens render as "Baz" (for example)?

鉴于以上任何输入,我想要输出"Foo Bar Baz".

Given any of the inputs above, I'd like to have the output "Foo Bar Baz".

推荐答案

以下任何选项均适用:

  1. 实施您自己的方法来获取语法分析树或令牌范围的文本,然后将某些已知令牌类型的处理置于此处.

  1. Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.

创建自己的Token类,该类知道返回某些令牌的规范形式,并创建一个TokenFactory实现,该实现创建该类型的令牌.然后使用setTokenFactory方法使您的词法分析器生成这些标记.

Create your own Token class that knows to return the canonical form of certain tokens, and create a TokenFactory implementation that creates tokens of that type. Then use the setTokenFactory method to cause your lexer to produce those tokens.

创建覆盖默认行为的自己的TokenStream实现.

Create your own TokenStream implementation that overrides the default behavior.

在创建令牌之前运行的操作中明确指定文本:

Explicitly specify the text in an action that runs prior to the creation of tokens:

FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };

其他选项也可能可用.

这篇关于在ANTLR中规范化令牌文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆