噪声数据流上的 ANTLR [英] ANTLR on a noisy data stream

查看:22
本文介绍了噪声数据流上的 ANTLR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 ANTLR 世界的新手,我正试图弄清楚如何使用这个解析工具来解释一组嘈杂"的字符串.我想实现的是以下内容.

让我们以这句话为例:现在是晚上 10 点,Lazy CAT 目前正在电视机前的沙发上沉睡

I'm very new in the ANTLR world and I'm trying to figure out how can I use this parsing tool to interpret a set of "noisy" string. What I would like to achieve is the following.

let's take for example this phrase : It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV

我想提取的是 CATSLEEPINGSOFA 并且有一个很容易匹配以下模式的语法:SUBJECT - VERB- 间接对象...我可以在其中定义

动词:'睡觉' |'行走';
主题:'猫'|'狗'|'鸟';
INDIRECT_OBJECT : '汽车'|'沙发';

等等.我不想以永久的NoViableException"结束,因为我无法描述语言结构的所有可能性.我只想把无用的文字撕碎,留下有趣的文字.

更像是如果我有一个标记器并询问解析器好吧,阅读流直到找到 SUBJECT,然后忽略其余部分直到找到 VERB,等等."

我需要在一个无组织的集合中提取一个有组织的结构......例如,我希望能够解释(我不是在判断这种完全基本和不正确的英语语法"观点的相关性)
主语 - 动词 - 间接宾语
间接宾语 - 主语 - 动词

所以我会解析像

这样的句子现在是晚上 10 点,Lazy CAT 目前正在电视机前的沙发上大量睡觉



现在是晚上 10 点,并且,在前面的沙发上电视,懒猫目前正在沉睡

What I would like to extract is CAT, SLEEPING and SOFA and have a grammar that match easily the following pattern : SUBJECT - VERB - INDIRECT OBJECT... where I could define

VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';

etc.. I don't want to ends up with a permanent "NoViableException" as I can't describe all the possibilities around the language structure. I just want to tear apart useless words and just keep the one that are interesting.

It's more like if I had a tokeniser and asked the parser "Ok, read the stream until you find a SUBJECT, then ignore the rest until you find a VERB, etc.."

I need to extract an organized structure in an un-organized set... For example, I would like to be able to interpret (I'm not judging the pertinence of this utterly basic and incorrect view of 'english grammar')
SUBJECT - VERB - INDIRECT OBJECT
INDIRECT OBJECT - SUBJECT - VERB

so I will parse sentences like

It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV

or

It's 10PM and, on the SOFA in front of the TV, the Lazy CAT is currently SLEEPING heavily

推荐答案

您只能创建几个词法分析器规则(例如您发布的规则),作为最后一个词法分析器规则,您可以匹配任何字符和 <代码>跳过()它:

You could create only a couple of lexer rules (the ones you posted, for example), and as a last lexer rule, you could match any character and skip() it:

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY             : . {skip();};

这里的顺序很重要:词法分析器尝试从上到下匹配标记,所以如果它不能匹配任何标记 VERBSUBJECTINDIRECT_OBJECT,它通过"到 ANY 规则并跳过这个标记.然后,您可以使用这些解析器规则来过滤您的输入流:

The order is important here: the lexer tries to match tokens from top to bottom, so if it can't match any of the tokens VERB, SUBJECT or INDIRECT_OBJECT, it "falls through" to the ANY rule and skips this token. You can then use these parser rules to filter your input stream:

parse
  :  sentenceParts+ EOF
  ;

sentenceParts
  :  SUBJECT VERB INDIRECT_OBJECT
  ;  

将解析输入文本:

现在是晚上 10 点,懒猫正在睡觉在电视前的沙发上很累.狗正在沙发上行走.

如下:

这篇关于噪声数据流上的 ANTLR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆