什么是 ANTLR 中的“语义谓词"? [英] What is a 'semantic predicate' in ANTLR?

查看:42
本文介绍了什么是 ANTLR 中的“语义谓词"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是 ANTLR 中的语义谓词?

解决方案

ANTLR 4

对于 ANTLR 4 中的谓词,请查看这些堆栈溢出问答:

<小时>

ANTLR 3

语义谓词是一种在语法上强制执行额外(语义)规则的方法使用纯代码操作.

语义谓词有3种类型:

  • 验证语义谓词;
  • 门控语义谓词;
  • 消除歧义语义谓词.

示例语法

假设您有一个仅由数字组成的文本块逗号,忽略任何空格.你想解析这个输入确保数字最多 3 位长"(最多 999).下列语法 (Numbers.g) 会做这样的事情:

语法数字;//此解析器的入口点:它解析至少包含以下内容的输入字符串//一个数字,可选后跟零个或多个逗号和数字解析: number (',' number)* EOF;//匹配一个长度在 1 到 3 位之间的数字数字: 数字 数字 数字|数字 数字|数字;//匹配一个数字数字:'0'..'9';//忽略空格空白: (' ' | '	' | '
' | '
') {skip();};

<小时>

测试

可以使用以下类测试语法:

import org.antlr.runtime.*;公共课主要{public static void main(String[] args) 抛出异常 {ANTLRStringStream in = new ANTLRStringStream("123, 456, 7, 89");NumbersLexer 词法分析器 = 新的 NumbersLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lexer);NumbersParser parser = new NumbersParser(tokens);parser.parse();}}

通过生成词法分析器和解析器、编译所有 .java 文件和运行 Main 类:

<前>java -cp antlr-3.2.jar org.antlr.Tool Numbers.gjavac -cp antlr-3.2.jar *.javajava -cp .:antlr-3.2.jar Main

这样做时,控制台没有打印任何内容,这表明没有任何内容出错.尝试更改:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7 , 89");

进入:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777, 89");

并再次进行测试:您将在字符串 777 之后看到控制台上出现错误.

<小时>

语义谓词

这将我们带到了语义谓词.假设你想解析长度在 1 到 10 位之间的数字.规则如下:

编号: 数字数字数字数字数字数字数字数字数字数字数字数字数字|数字数字数字数字数字数字数字数字数字数字数字数字数字/* ... */|数字 数字 数字|数字 数字|数字;

会变得很麻烦.语义谓词可以帮助简化这种类型的规则.

<小时>

1.验证语义谓词

验证语义谓词什么都不是不止是一段代码后跟一个问号:

RULE {/* 这里是一个布尔表达式 */}?

使用验证解决上述问题语义谓词,将语法中的number规则改为:

编号@init { int N = 0;}:(数字 { N++; } )+ { N <= 10 }?;

零件 { int N = 0;}{ N++;} 是纯 Java 语句,其中第一个在解析器输入"number 规则时被初始化.实际上谓词是:{ N <= 10 }?,这会导致解析器抛出一个FailedPredicateException每当数字长度超过 10 位时.

使用以下 ANTLRStringStream 对其进行测试:

//全部等于或小于 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");

它不会产生异常,而以下会产生异常:

//'12345678901' 超过 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

<小时>

2.门控语义谓词

一个门控语义谓词类似于一个验证语义谓词,只有 gated 版本会产生语法错误,而不是 FailedPredicateException.

门控语义谓词的语法是:

{/* 这里是一个布尔表达式 */}?=>规则

要使用 gated 谓词来匹配长达 10 位的数字来解决上述问题,您可以这样写:

编号@init { int N = 1;}:( { N <= 10 }?=> 数字 { N++; } )+;

用两者再次测试:

//全部等于或小于 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");

和:

//'12345678901' 超过 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

你会看到最后一个会抛出错误.

<小时>

3.消除语义谓词的歧义

谓词的最后一种类型是消歧语义谓词,它看起来有点像验证谓词 ({boolean-expression}?),但行为更像是一个门控语义谓词(当布尔表达式的计算结果为 false 时不会抛出异常).您可以在规则的开头使用它来检查规则的某些属性,并让解析器匹配或不匹配该规则.

假设示例语法创建了 Number 标记(词法规则而不是解析器规则),它将匹配 0..999 范围内的数字.现在在解析器中,您想区分低数和高数(低:0..500,高:501..999).这可以使用消除歧义的语义谓词来完成,您可以在其中检查流中的下一个标记 (input.LT(1)) 以检查它是低还是高.

演示:

语法数字;解析: 原子 (',' 原子)* EOF;原子: 低 {System.out.println("low = " + $low.text);}|high {System.out.println("high = " + $high.text);};低的:{Integer.valueOf(input.LT(1).getText()) <= 500}?数字;高的:  数字;数字: 数字 数字 数字|数字 数字|数字;片段数字:'0'..'9';空白: (' ' | '	' | '
' | '
') {skip();};

如果您现在解析字符串 "123, 999, 456, 700, 89, 0",您将看到以下输出:

低 = 123高 = 999低 = 456高 = 700低 = 89低 = 0

What is a semantic predicate in ANTLR?

解决方案

ANTLR 4

For predicates in ANTLR 4, checkout these stackoverflow Q&A's:


ANTLR 3

A semantic predicate is a way to enforce extra (semantic) rules upon grammar actions using plain code.

There are 3 types of semantic predicates:

  • validating semantic predicates;
  • gated semantic predicates;
  • disambiguating semantic predicates.

Example grammar

Let's say you have a block of text consisting of only numbers separated by comma's, ignoring any white spaces. You would like to parse this input making sure that the numbers are at most 3 digits "long" (at most 999). The following grammar (Numbers.g) would do such a thing:

grammar Numbers;

// entry point of this parser: it parses an input string consisting of at least 
// one number, optionally followed by zero or more comma's and numbers
parse
  :  number (',' number)* EOF
  ;

// matches a number that is between 1 and 3 digits long
number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

// matches a single digit
Digit
  :  '0'..'9'
  ;

// ignore spaces
WhiteSpace
  :  (' ' | '	' | '
' | '
') {skip();}
  ;


Testing

The grammar can be tested with the following class:

import org.antlr.runtime.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");
        NumbersLexer lexer = new NumbersLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        NumbersParser parser = new NumbersParser(tokens);
        parser.parse();
    }
}

Test it by generating the lexer and parser, compiling all .java files and running the Main class:

java -cp antlr-3.2.jar org.antlr.Tool Numbers.g
javac -cp antlr-3.2.jar *.java
java -cp .:antlr-3.2.jar Main

When doing so, nothing is printed to the console, which indicates that nothing went wrong. Try changing:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");

into:

ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777   , 89");

and do the test again: you will see an error appearing on the console right after the string 777.


Semantic Predicates

This brings us to the semantic predicates. Let's say you want to parse numbers between 1 and 10 digits long. A rule like:

number
  :  Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit
  |  Digit Digit Digit Digit Digit Digit Digit Digit Digit
     /* ... */
  |  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

would become cumbersome. Semantic predicates can help simplify this type of rule.


1. Validating Semantic Predicates

A validating semantic predicate is nothing more than a block of code followed by a question mark:

RULE { /* a boolean expression in here */ }?

To solve the problem above using a validating semantic predicate, change the number rule in the grammar into:

number
@init { int N = 0; }
  :  (Digit { N++; } )+ { N <= 10 }?
  ;

The parts { int N = 0; } and { N++; } are plain Java statements of which the first is initialized when the parser "enters" the number rule. The actual predicate is: { N <= 10 }?, which causes the parser to throw a FailedPredicateException whenever a number is more than 10 digits long.

Test it by using the following ANTLRStringStream:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

which produces no exception, while the following does thow an exception:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");


2. Gated Semantic Predicates

A gated semantic predicate is similar to a validating semantic predicate, only the gated version produces a syntax error instead of a FailedPredicateException.

The syntax of a gated semantic predicate is:

{ /* a boolean expression in here */ }?=> RULE

To instead solve the above problem using gated predicates to match numbers up to 10 digits long you would write:

number
@init { int N = 1; }
  :  ( { N <= 10 }?=> Digit { N++; } )+
  ;

Test it again with both:

// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 

and:

// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");

and you will see the last on will throw an error.


3. Disambiguating Semantic Predicates

The final type of predicate is a disambiguating semantic predicate, which looks a bit like a validating predicate ({boolean-expression}?), but acts more like a gated semantic predicate (no exception is thrown when the boolean expression evaluates to false). You can use it at the start of a rule to check some property of a rule and let the parser match said rule or not.

Let's say the example grammar creates Number tokens (a lexer rule instead of a parser rule) that will match numbers in the range of 0..999. Now in the parser, you'd like to make a distinction between low- and hight numbers (low: 0..500, high: 501..999). This could be done using a disambiguating semantic predicate where you inspect the token next in the stream (input.LT(1)) to check if it's either low or high.

A demo:

grammar Numbers;

parse
  :  atom (',' atom)* EOF
  ;

atom
  :  low  {System.out.println("low  = " + $low.text);}
  |  high {System.out.println("high = " + $high.text);}
  ;

low
  :  {Integer.valueOf(input.LT(1).getText()) <= 500}? Number
  ;

high
  :  Number
  ;

Number
  :  Digit Digit Digit
  |  Digit Digit
  |  Digit
  ;

fragment Digit
  :  '0'..'9'
  ;

WhiteSpace
  :  (' ' | '	' | '
' | '
') {skip();}
  ;

If you now parse the string "123, 999, 456, 700, 89, 0", you'd see the following output:

low  = 123
high = 999
low  = 456
high = 700
low  = 89
low  = 0

这篇关于什么是 ANTLR 中的“语义谓词"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆