ANTLR处理嘈杂的数据流 [英] ANTLR on a noisy data stream

查看:99
本文介绍了ANTLR处理嘈杂的数据流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ANTLR世界中是一个新手,我试图弄清楚如何使用此解析工具来解释一组嘈杂"的字符串.我要实现的目标如下.

让我们以这个短语为例:It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV

I'm very new in the ANTLR world and I'm trying to figure out how can I use this parsing tool to interpret a set of "noisy" string. What I would like to achieve is the following.

let's take for example this phrase : It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV

我想提取的是CATSLEEPINGSOFA,它们的语法很容易匹配以下模式:主题-动词-间接对象...在这里我可以定义

VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';

等.我不想以永久的"NoViableException"结尾,因为我无法描述语言结构的所有可能性.我只想撕掉无用的单词,只保留那些有趣的单词.

这更像是如果我有一个标记器,并问解析器好吧,读取流,直到找到一个主题,然后忽略其余内容,直到找到一个动词,依此类推."

我需要从一个无组织的集合中提取一个有组织的结构...例如,我希望能够进行解释(我并没有判断这种英语语法"的根本和错误观点的相关性)
SUBJECT - VERB - INDIRECT OBJECT
INDIRECT OBJECT - SUBJECT - VERB

所以我会解析

It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV



It's 10PM and, on the SOFA in front of the TV, the Lazy CAT is currently SLEEPING heavily

What I would like to extract is CAT, SLEEPING and SOFA and have a grammar that match easily the following pattern : SUBJECT - VERB - INDIRECT OBJECT... where I could define

VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';

etc.. I don't want to ends up with a permanent "NoViableException" as I can't describe all the possibilities around the language structure. I just want to tear apart useless words and just keep the one that are interesting.

It's more like if I had a tokeniser and asked the parser "Ok, read the stream until you find a SUBJECT, then ignore the rest until you find a VERB, etc.."

I need to extract an organized structure in an un-organized set... For example, I would like to be able to interpret (I'm not judging the pertinence of this utterly basic and incorrect view of 'english grammar')
SUBJECT - VERB - INDIRECT OBJECT
INDIRECT OBJECT - SUBJECT - VERB

so I will parse sentences like

It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV

or

It's 10PM and, on the SOFA in front of the TV, the Lazy CAT is currently SLEEPING heavily

推荐答案

您只能创建几个词法分析器规则(例如,您发布的词法分析器规则),作为最后一个词法分析器规则,您可以匹配任何字符,并且skip()它:

You could create only a couple of lexer rules (the ones you posted, for example), and as a last lexer rule, you could match any character and skip() it:

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY             : . {skip();};

这里的顺序很重要:词法分析器尝试从上到下匹配标记,因此,如果它不能与任何标记VERBSUBJECTINDIRECT_OBJECT匹配,则会掉线"到ANY规则,并跳过此令牌.然后,您可以使用以下解析器规则来过滤您的输入流:

The order is important here: the lexer tries to match tokens from top to bottom, so if it can't match any of the tokens VERB, SUBJECT or INDIRECT_OBJECT, it "falls through" to the ANY rule and skips this token. You can then use these parser rules to filter your input stream:

parse
  :  sentenceParts+ EOF
  ;

sentenceParts
  :  SUBJECT VERB INDIRECT_OBJECT
  ;  

它将解析输入文本:

现在是晚上10点,懒惰的CAT当前正在休眠 在电视前的沙发上忙得不可开交.狗狗 正在沙发上行走.

It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. The DOG is WALKING on the SOFA.

如下:

这篇关于ANTLR处理嘈杂的数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆