嘈杂的数据流上的ANTLR第2部分 [英] ANTLR on a noisy data stream Part 2
问题描述
在与Bart Kiers进行了非常有趣的讨论之后,使用ANTLR解析嘈杂的数据流,我最后遇到了另一个问题...
Following a very interesing discussion with Bart Kiers on parsing a noisy datastream with ANTLR, I'm ending up with another problem...
目标仍然是相同的:仅使用以下语法提取有用的信息,
The aim is still the same : only extracting useful information with the following grammar,
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY : . {skip();};
parse
: sentenceParts+ EOF
;
sentenceParts
: SUBJECT VERB INDIRECT_OBJECT
;
类似于it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV.
的句子将产生以下
a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV.
will produce the following
这是完美的,并且完全满足我的要求..从一个大句子中,我仅提取对我有意义的单词.但是,我发现了以下错误.如果我在文本中的某个地方引入了一个完全像标记一样的单词,那么我将以MismathedTokenException
或noViableException
This is perfect and it's doing exactly what I want.. from a big sentence, I'm extracting only the words that had a sense for me.... But the, I founded the following error. If somewhere in the text I'm introducing a word that begin exactly like a token, I'm ending up with a MismathedTokenException
or a noViableException
it's 10PM and the Lazy CAT is currently SLEEPING heavily,
with a DOGGY bag, on the SOFA in front of the TV.
产生错误:
DOGGY
被解释为DOG
的开始,而DOG
也是令牌SUBJECT
的一部分,并且词法分析器丢失了...如果不将DOGGY
定义为特殊标记,如何避免这种情况.我希望解析器本身将DOGGY
理解为一个单词.
DOGGY
is interpreted as the beginning for DOG
which is also a part of the TOKEN SUBJECT
and the lexer is lost... How could I avoid this without defining DOGGY
as a special token... I would have like the parser to understand DOGGY
as a word in itself.
推荐答案
好吧,添加此ANY2 :'A'..'Z'+ {skip();};
似乎可以解决我的问题!
Well, it seems that adding this ANY2 :'A'..'Z'+ {skip();};
solves my problem !
这篇关于嘈杂的数据流上的ANTLR第2部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!