噪声数据流上的 ANTLR 第 2 部分 [英] ANTLR on a noisy data stream Part 2
问题描述
在与 Bart Kiers 进行非常有趣的讨论之后关于使用 ANTLR 解析嘈杂的数据流,我又遇到了另一个问题...
目的还是一样的:只用下面的语法提取有用的信息,
动词 : 'SLEEPING' |'步行';主题:'猫'|'狗'|'鸟';INDIRECT_OBJECT : '汽车'|'沙发';任何 : .{跳过();};解析: 句子部分+ EOF;句子部分: 主题动词 INDIRECT_OBJECT;
像现在是晚上 10 点,懒猫目前正在电视机前的沙发上沉睡.
将产生以下内容
这是完美的,它完全符合我的要求......从一个大句子中,我只提取了对我有意义的词......但是,我发现了以下错误.如果我在文本中的某个地方引入了一个与标记完全相同的词,我会以 MismathedTokenException
或 noViableException
产生错误:
DOGGY
被解释为 DOG
的开头,它也是 TOKEN SUBJECT
的一部分,词法分析器丢失了......如何我可以在不将 DOGGY
定义为特殊标记的情况下避免这种情况吗...我希望解析器将 DOGGY
本身理解为一个词.
好吧,看来加上这个 ANY2 :'A'..'Z'+ {skip();};
就解决了我的问题!
Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...
The aim is still the same : only extracting useful information with the following grammar,
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY : . {skip();};
parse
: sentenceParts+ EOF
;
sentenceParts
: SUBJECT VERB INDIRECT_OBJECT
;
a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV.
will produce the following
This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException
or a noViableException
it's 10PM and the Lazy CAT is currently SLEEPING heavily, with a DOGGY bag, on the SOFA in front of the TV.
produce an error :
DOGGY
is interpreted as the beginning for DOG
which is also a part of the TOKEN SUBJECT
and the lexer is lost... How could I avoid this without defining DOGGY
as a special token... I would have like the parser to understand DOGGY
as a word in itself.
Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();};
solves my problem !
这篇关于噪声数据流上的 ANTLR 第 2 部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!