Antlr:识别日期和数字的最简单方法? [英] Antlr: Simplest way to recognize dates and numbers?

查看:31
本文介绍了Antlr:识别日期和数字的最简单方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在同一语法中解析有效日期和数字的最简单(最短、最少规则和无警告)方法是什么?我的问题是匹配有效月份 (1-12) 的词法分析器规则将匹配任何出现的 1-12.所以如果我只想匹配一个数字,我需要一个解析规则,如:

What is the simplest (shortest, fewest rules, and no warnings) way to parse both valid dates and numbers in the same grammar? My problem is that a lexer rule to match a valid month (1-12) will match any occurrence of 1-12. So if I just want to match a number, I need a parse rule like:

number: (MONTH|INT);

当我为日和年添加词法分析器规则时,它只会变得更加复杂.我想要这样的日期解析规则:

It only gets more complex when I add lexer rules for day and year. I want a parse rule for date like this:

date: month '/' day ( '/' year )? -> ^('DATE' year month day);

我不在乎月、日和月year 是解析器或词法分析器规则,只要我最终得到相同的树结构.我还需要能够识别其他地方的数字,例如:

I don't care if month,day & year are parse or lexer rules, just so long as I end up with the same tree structure. I also need to be able to recognize numbers elsewhere, e.g.:

foo: STRING OP number -> ^(OP STRING number);
STRING: ('a'..'z')+;
OP: ('<'|'>');

推荐答案

问题是您似乎想要在词法分析器和/或解析器中执行语法和语义检查.这是一个常见的错误,而且只有在非常简单的语言中才有可能发生.

The problem is that you seem to want to perform both syntactical and semantical checking in your lexer and/or your parser. It's a common mistake, and something that is only possible in very simple languages.

您真正需要做的是在词法分析器和解析器中更广泛地接受,然后执行语义检查.您对词法的严格程度取决于您,但是您有两个基本选择,具体取决于您是否需要在一个月的日期之前接受零:1) 真正接受您的 INT,2) 将 DATENUM 定义为只接受那些有效天数但不是有效 INT 的令牌.我推荐第二种方法,因为稍后在代码中需要的语义检查较少(因为 INT 将在语法级别进行验证,您只需要对日期执行语义检查.第一种方法:

What you really need to do is accept more broadly in the lexer and parser, and then perform semantic checks. How strict you are in your lexing is up to you, but you have two basic options, depending on whether or not you need to accept zeroes preceding your days of the month: 1) Be really accepting for your INTs, 2) define DATENUM to only accept those tokens that are valid days, yet not valid INTs. I recommend the second because there will be less semantic checks necessary later in the code (since INTs will then be verifiable at the syntax level and you'll only need to perform semantic checks on your dates. The first approach:

INT: '0'..'9'+;

第二种方法:

DATENUM: '0' '1'..'9';
INT: '0' | SIGN? '1'..'9' '0'..'9'*;

在词法分析器中接受使用这些规则后,您的日期字段将是:

After accepting using these rules in the lexer, your date field would be either:

date: INT '/' INT ( '/' INT )?

或:

date: (INT | DATENUM) '/' (INT | DATENUM) ('/' (INT | DATENUM) )?

之后,您将对 AST 执行语义运行,以确保您的日期有效.

After that, you would perform a semantic run over your AST to make sure that your dates are valid.

但是,如果您对在语法中执行语义检查一成不变,那么 ANTLR 允许在解析器中使用语义谓词,因此您可以创建一个日期字段来检查如下值:

If you're dead set on performing semantic checks in your grammar, however, ANTLR allows semantic predicates in the parser, so you could make a date field that checks the values like this:

date: month=INT '/' day=INT ( year='/' INT )? { year==null ? (/* First check /*) : (/* Second check */)}

然而,当你这样做时,你是在你的语法中嵌入了语言特定的代码,并且它不能跨目标移植.

When you do this, however, you are embedding language specific code in your grammar, and it won't be portable across targets.

这篇关于Antlr:识别日期和数字的最简单方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆