在 ANTLR 中规范化标记文本 [英] Canonicalizing token text in ANTLR

查看:28
本文介绍了在 ANTLR 中规范化标记文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR 有没有办法将某些标记标记为具有规范输出?

Is there a way in ANTLR to mark certain tokens as having canonical output?

例如,给定语法(摘录)

For example, given the grammar (excerpt)

words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);

words 将匹配FOO BAR BAZ"、foo bar baz"、Foo bAr baZ"等

words will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.

当我调用 TokenStream#getText(Context) 时,它将返回连接在一起的令牌的实际文本.

When I call TokenStream#getText(Context), it'll return the tokens' actual text concatenated together.

有没有办法规范化"这个输出,无论输入是什么,所有 FOO 标记都呈现为Foo",BAR 标记呈现为Bar",并且 BAZ 标记呈现为Baz"(例如)?

Is there a way to "canonicalize" this output such that no matter what the input, all FOO tokens render as "Foo", BAR tokens render as "Bar", and BAZ tokens render as "Baz" (for example)?

给定以上任何输入,我想要输出Foo Bar Baz".

Given any of the inputs above, I'd like to have the output "Foo Bar Baz".

推荐答案

以下任一选项都可以:

  1. 实现您自己的方法来获取解析树或标记范围的文本,并将某些已知标记类型的处理放在那里.

  1. Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.

创建您自己的 Token 类,该类知道返回某些令牌的规范形式,并创建一个 TokenFactory 实现来创建该类型的令牌.然后使用 setTokenFactory 方法使您的词法分析器生成这些标记.

Create your own Token class that knows to return the canonical form of certain tokens, and create a TokenFactory implementation that creates tokens of that type. Then use the setTokenFactory method to cause your lexer to produce those tokens.

创建您自己的 TokenStream 实现来覆盖默认行为.

Create your own TokenStream implementation that overrides the default behavior.

在创建令牌之前运行的操作中明确指定文本:

Explicitly specify the text in an action that runs prior to the creation of tokens:

FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };

其他选项也可能可用.

这篇关于在 ANTLR 中规范化标记文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆