Antlr:识别日期和数字的最简单方法是? [英] Antlr: Simplest way to recognize dates and numbers?

查看:130
本文介绍了Antlr:识别日期和数字的最简单方法是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

解析同一语法中的有效日期和数字的最简单的方法(最短,最少的规则,并且没有警告)?我的问题是,匹配有效月份(1-12)的词法分析器规则将匹配任何出现的1-12.因此,如果我只想匹配一个数字,则需要一个解析规则,例如:

What is the simplest (shortest, fewest rules, and no warnings) way to parse both valid dates and numbers in the same grammar? My problem is that a lexer rule to match a valid month (1-12) will match any occurrence of 1-12. So if I just want to match a number, I need a parse rule like:

number: (MONTH|INT);

当我添加日期和年份的词法分析器规则时,它只会变得更加复杂.我想要这样的日期解析规则:

It only gets more complex when I add lexer rules for day and year. I want a parse rule for date like this:

date: month '/' day ( '/' year )? -> ^('DATE' year month day);

我不在乎月份,日期和时间;年份是解析规则或词法分析器规则,只要我最终使用相同的树结构即可.我还需要能够识别其他地方的数字,例如:

I don't care if month,day & year are parse or lexer rules, just so long as I end up with the same tree structure. I also need to be able to recognize numbers elsewhere, e.g.:

foo: STRING OP number -> ^(OP STRING number);
STRING: ('a'..'z')+;
OP: ('<'|'>');

推荐答案

问题是您似乎想在词法分析器和/或解析器中执行语法和语义检查.这是一个常见的错误,只有在非常简单的语言中才能实现.

The problem is that you seem to want to perform both syntactical and semantical checking in your lexer and/or your parser. It's a common mistake, and something that is only possible in very simple languages.

您真正需要做的是在词法分析器和解析器中更广泛地接受它,然后执行语义检查.您对词法的严格程度取决于您,但是您有两个基本选择,具体取决于您是否需要在当月的几天之前接受零:1)真正接受INT,2)定义DATENUM为仅接受有效日期但无效的INT的令牌.我建议使用第二种方法,因为稍后在代码中将需要较少的语义检查(因为INT将可以在语法级别进行验证,因此您只需要在日期上执行语义检查.第一种方法:

What you really need to do is accept more broadly in the lexer and parser, and then perform semantic checks. How strict you are in your lexing is up to you, but you have two basic options, depending on whether or not you need to accept zeroes preceding your days of the month: 1) Be really accepting for your INTs, 2) define DATENUM to only accept those tokens that are valid days, yet not valid INTs. I recommend the second because there will be less semantic checks necessary later in the code (since INTs will then be verifiable at the syntax level and you'll only need to perform semantic checks on your dates. The first approach:

INT: '0'..'9'+;

第二种方法:

DATENUM: '0' '1'..'9';
INT: '0' | SIGN? '1'..'9' '0'..'9'*;

在词法分析器中接受使用这些规则后,您的日期字段将为:

After accepting using these rules in the lexer, your date field would be either:

date: INT '/' INT ( '/' INT )?

或:

date: (INT | DATENUM) '/' (INT | DATENUM) ('/' (INT | DATENUM) )?

此后,您将对AST执行语义运行,以确保您的日期有效.

After that, you would perform a semantic run over your AST to make sure that your dates are valid.

但是,如果您对在语法中执行语义检查一无所知,则ANTLR允许解析器中使用语义谓词,因此您可以创建一个日期字段来检查以下值:

If you're dead set on performing semantic checks in your grammar, however, ANTLR allows semantic predicates in the parser, so you could make a date field that checks the values like this:

date: month=INT '/' day=INT ( year='/' INT )? { year==null ? (/* First check /*) : (/* Second check */)}

但是,当您执行此操作时,您会将特定于语言的代码嵌入语法中,并且无法跨目标移植.

When you do this, however, you are embedding language specific code in your grammar, and it won't be portable across targets.

这篇关于Antlr:识别日期和数字的最简单方法是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆