Antlr4无法正确识别Unicode字符 [英] Antlr4 doesn't correctly recognizes unicode characters

查看：436 发布时间：2020/9/3 0:25:33 antlr4

本文介绍了Antlr4无法正确识别Unicode字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常简单的语法，试图将'é'与令牌E_CODE匹配. 我已经使用TestRig工具(带有-tokens选项)对其进行了测试，但是解析器无法正确匹配它. 我的输入文件是使用UTF-8编码的，没有BOM，并且我使用的是ANTLR 4.4版. 其他人也可以检查一下吗?我在控制台上得到了以下输出:
第1:0行令牌识别错误:Ă"

I've very simple grammar which tries to match 'é' to token E_CODE. I've tested it using TestRig tool (with -tokens option), but parser can't correctly match it. My input file was encoded in UTF-8 without BOM and I've used ANTLR version 4.4. Could somebody else also check this ? I got this output on my console:
line 1:0 token recognition error at: 'Ă'

grammar Unicode;

stat:EOF;  
E_CODE: '\u00E9' | 'é';

推荐答案

我测试了语法:

grammar Unicode;

stat: E_CODE* EOF;

E_CODE: '\u00E9' | 'é';

如下:

UnicodeLexer lexer = new UnicodeLexer(new ANTLRInputStream("\u00E9é"));
UnicodeParser parser = new UnicodeParser(new CommonTokenStream(lexer));
System.out.println(parser.stat().getText());

，以下内容已打印到我的控制台上:

and the following got printed to my console:

éé<EOF>

使用4.2和4.3进行了测试(4.4还没有在Maven Central中使用).

Tested with 4.2 and 4.3 (4.4 isn't in Maven Central yet).

查看

Looking at the source I see TestRig takes an optional -encoding param. Have you tried setting it?

这篇关于Antlr4无法正确识别Unicode字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Antlr4无法正确识别Unicode字符 [英] Antlr4 doesn't correctly recognizes unicode characters

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Antlr4无法正确识别Unicode字符 [英] Antlr4 doesn&#39;t correctly recognizes unicode characters

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Antlr4无法正确识别Unicode字符 [英] Antlr4 doesn't correctly recognizes unicode characters

登录关闭