噪声数据流上的 ANTLR 第 2 部分 [英] ANTLR on a noisy data stream Part 2

查看:23
本文介绍了噪声数据流上的 ANTLR 第 2 部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在与 Bart Kiers 进行非常有趣的讨论之后关于使用 ANTLR 解析嘈杂的数据流,我又遇到了另一个问题...

目的还是一样的:只用下面的语法提取有用的信息,

动词 : 'SLEEPING' |'步行';主题:'猫'|'狗'|'鸟';INDIRECT_OBJECT : '汽车'|'沙发';任何             : .{跳过();};解析: 句子部分+ EOF;句子部分: 主题动词 INDIRECT_OBJECT;

现在是晚上 10 点,懒猫目前正在电视机前的沙发上沉睡. 将产生以下内容

这是完美的,它完全符合我的要求......从一个大句子中,我只提取了对我有意义的词......但是,我发现了以下错误.如果我在文本中的某个地方引入了一个与标记完全相同的词,我会以 MismathedTokenExceptionnoViableException

结尾<前>现在是晚上 10 点,Lazy CAT 目前正在沉睡,带着 DOGGY 包,放在电视机前的沙发上.

产生错误:

DOGGY 被解释为 DOG 的开头,它也是 TOKEN SUBJECT 的一部分,词法分析器丢失了......如何我可以在不将 DOGGY 定义为特殊标记的情况下避免这种情况吗...我希望解析器将 DOGGY 本身理解为一个词.

解决方案

好吧,看来加上这个 ANY2 :'A'..'Z'+ {skip();}; 就解决了我的问题!

Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...

The aim is still the same : only extracting useful information with the following grammar,

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';  
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException or a noViableException


    it's 10PM and the Lazy CAT is currently SLEEPING heavily, 
    with a DOGGY bag, on the SOFA in front of the TV.

produce an error :

DOGGY is interpreted as the beginning for DOG which is also a part of the TOKEN SUBJECT and the lexer is lost... How could I avoid this without defining DOGGY as a special token... I would have like the parser to understand DOGGY as a word in itself.

解决方案

Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();}; solves my problem !

这篇关于噪声数据流上的 ANTLR 第 2 部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆