什么是 ANTLR 中的“语义谓词"? [英] What is a 'semantic predicate' in ANTLR?
问题描述
什么是 ANTLR 中的语义谓词?
ANTLR 4
对于 ANTLR 4 中的谓词,请查看这些堆栈溢出问答:
<小时>ANTLR 3
语义谓词是一种在语法上强制执行额外(语义)规则的方法使用纯代码操作.
语义谓词有3种类型:
- 验证语义谓词;
- 门控语义谓词;
- 消除歧义语义谓词.
示例语法
假设您有一个仅由数字组成的文本块逗号,忽略任何空格.你想解析这个输入确保数字最多 3 位长"(最多 999).下列语法 (Numbers.g
) 会做这样的事情:
语法数字;//此解析器的入口点:它解析至少包含以下内容的输入字符串//一个数字,可选后跟零个或多个逗号和数字解析: number (',' number)* EOF;//匹配一个长度在 1 到 3 位之间的数字数字: 数字 数字 数字|数字 数字|数字;//匹配一个数字数字:'0'..'9';//忽略空格空白: (' ' | ' ' | '
' | '
') {skip();};
<小时>
测试
可以使用以下类测试语法:
import org.antlr.runtime.*;公共课主要{public static void main(String[] args) 抛出异常 {ANTLRStringStream in = new ANTLRStringStream("123, 456, 7, 89");NumbersLexer 词法分析器 = 新的 NumbersLexer(in);CommonTokenStream 令牌 = new CommonTokenStream(lexer);NumbersParser parser = new NumbersParser(tokens);parser.parse();}}
通过生成词法分析器和解析器、编译所有 .java
文件和运行 Main
类:
这样做时,控制台没有打印任何内容,这表明没有任何内容出错.尝试更改:
ANTLRStringStream in = new ANTLRStringStream("123, 456, 7 , 89");
进入:
ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777, 89");
并再次进行测试:您将在字符串 777
之后看到控制台上出现错误.
语义谓词
这将我们带到了语义谓词.假设你想解析长度在 1 到 10 位之间的数字.规则如下:
编号: 数字数字数字数字数字数字数字数字数字数字数字数字数字|数字数字数字数字数字数字数字数字数字数字数字数字数字/* ... */|数字 数字 数字|数字 数字|数字;
会变得很麻烦.语义谓词可以帮助简化这种类型的规则.
<小时>1.验证语义谓词
验证语义谓词什么都不是不止是一段代码后跟一个问号:
RULE {/* 这里是一个布尔表达式 */}?
使用验证解决上述问题语义谓词,将语法中的number
规则改为:
编号@init { int N = 0;}:(数字 { N++; } )+ { N <= 10 }?;
零件 { int N = 0;}
和 { N++;}
是纯 Java 语句,其中第一个在解析器输入"number
规则时被初始化.实际上谓词是:{ N <= 10 }?
,这会导致解析器抛出一个FailedPredicateException
每当数字长度超过 10 位时.
使用以下 ANTLRStringStream
对其进行测试:
//全部等于或小于 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");
它不会产生异常,而以下会产生异常:
//'12345678901' 超过 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
<小时>
2.门控语义谓词
一个门控语义谓词类似于一个验证语义谓词,只有 gated 版本会产生语法错误,而不是 FailedPredicateException
.
门控语义谓词的语法是:
{/* 这里是一个布尔表达式 */}?=>规则
要使用 gated 谓词来匹配长达 10 位的数字来解决上述问题,您可以这样写:
编号@init { int N = 1;}:( { N <= 10 }?=> 数字 { N++; } )+;
用两者再次测试:
//全部等于或小于 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");
和:
//'12345678901' 超过 10 位ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
你会看到最后一个会抛出错误.
<小时>3.消除语义谓词的歧义
谓词的最后一种类型是消歧语义谓词,它看起来有点像验证谓词 ({boolean-expression}?
),但行为更像是一个门控语义谓词(当布尔表达式的计算结果为 false
时不会抛出异常).您可以在规则的开头使用它来检查规则的某些属性,并让解析器匹配或不匹配该规则.
假设示例语法创建了 Number
标记(词法规则而不是解析器规则),它将匹配 0..999 范围内的数字.现在在解析器中,您想区分低数和高数(低:0..500,高:501..999).这可以使用消除歧义的语义谓词来完成,您可以在其中检查流中的下一个标记 (input.LT(1)
) 以检查它是低还是高.
演示:
语法数字;解析: 原子 (',' 原子)* EOF;原子: 低 {System.out.println("low = " + $low.text);}|high {System.out.println("high = " + $high.text);};低的:{Integer.valueOf(input.LT(1).getText()) <= 500}?数字;高的: 数字;数字: 数字 数字 数字|数字 数字|数字;片段数字:'0'..'9';空白: (' ' | ' ' | '
' | '
') {skip();};
如果您现在解析字符串 "123, 999, 456, 700, 89, 0"
,您将看到以下输出:
低 = 123高 = 999低 = 456高 = 700低 = 89低 = 0
What is a semantic predicate in ANTLR?
ANTLR 4
For predicates in ANTLR 4, checkout these stackoverflow Q&A's:
ANTLR 3
A semantic predicate is a way to enforce extra (semantic) rules upon grammar actions using plain code.
There are 3 types of semantic predicates:
- validating semantic predicates;
- gated semantic predicates;
- disambiguating semantic predicates.
Example grammar
Let's say you have a block of text consisting of only numbers separated by
comma's, ignoring any white spaces. You would like to parse this input making
sure that the numbers are at most 3 digits "long" (at most 999). The following
grammar (Numbers.g
) would do such a thing:
grammar Numbers;
// entry point of this parser: it parses an input string consisting of at least
// one number, optionally followed by zero or more comma's and numbers
parse
: number (',' number)* EOF
;
// matches a number that is between 1 and 3 digits long
number
: Digit Digit Digit
| Digit Digit
| Digit
;
// matches a single digit
Digit
: '0'..'9'
;
// ignore spaces
WhiteSpace
: (' ' | ' ' | '
' | '
') {skip();}
;
Testing
The grammar can be tested with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("123, 456, 7 , 89");
NumbersLexer lexer = new NumbersLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
NumbersParser parser = new NumbersParser(tokens);
parser.parse();
}
}
Test it by generating the lexer and parser, compiling all .java
files and
running the Main
class:
java -cp antlr-3.2.jar org.antlr.Tool Numbers.g javac -cp antlr-3.2.jar *.java java -cp .:antlr-3.2.jar Main
When doing so, nothing is printed to the console, which indicates that nothing went wrong. Try changing:
ANTLRStringStream in = new ANTLRStringStream("123, 456, 7 , 89");
into:
ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777 , 89");
and do the test again: you will see an error appearing on the console right after the string 777
.
Semantic Predicates
This brings us to the semantic predicates. Let's say you want to parse numbers between 1 and 10 digits long. A rule like:
number
: Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit
| Digit Digit Digit Digit Digit Digit Digit Digit Digit
/* ... */
| Digit Digit Digit
| Digit Digit
| Digit
;
would become cumbersome. Semantic predicates can help simplify this type of rule.
1. Validating Semantic Predicates
A validating semantic predicate is nothing more than a block of code followed by a question mark:
RULE { /* a boolean expression in here */ }?
To solve the problem above using a validating
semantic predicate, change the number
rule in the grammar into:
number
@init { int N = 0; }
: (Digit { N++; } )+ { N <= 10 }?
;
The parts { int N = 0; }
and { N++; }
are plain Java statements of which
the first is initialized when the parser "enters" the number
rule. The actual
predicate is: { N <= 10 }?
, which causes the parser to throw a
FailedPredicateException
whenever a number is more than 10 digits long.
Test it by using the following ANTLRStringStream
:
// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");
which produces no exception, while the following does thow an exception:
// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
2. Gated Semantic Predicates
A gated semantic predicate is similar to a validating semantic predicate,
only the gated version produces a syntax error instead of a FailedPredicateException
.
The syntax of a gated semantic predicate is:
{ /* a boolean expression in here */ }?=> RULE
To instead solve the above problem using gated predicates to match numbers up to 10 digits long you would write:
number
@init { int N = 1; }
: ( { N <= 10 }?=> Digit { N++; } )+
;
Test it again with both:
// all equal or less than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890");
and:
// '12345678901' is more than 10 digits
ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
and you will see the last on will throw an error.
3. Disambiguating Semantic Predicates
The final type of predicate is a disambiguating semantic predicate, which looks a bit like a validating predicate ({boolean-expression}?
), but acts more like a gated semantic predicate (no exception is thrown when the boolean expression evaluates to false
). You can use it at the start of a rule to check some property of a rule and let the parser match said rule or not.
Let's say the example grammar creates Number
tokens (a lexer rule instead of a parser rule) that will match numbers in the range of 0..999. Now in the parser, you'd like to make a distinction between low- and hight numbers (low: 0..500, high: 501..999). This could be done using a disambiguating semantic predicate where you inspect the token next in the stream (input.LT(1)
) to check if it's either low or high.
A demo:
grammar Numbers;
parse
: atom (',' atom)* EOF
;
atom
: low {System.out.println("low = " + $low.text);}
| high {System.out.println("high = " + $high.text);}
;
low
: {Integer.valueOf(input.LT(1).getText()) <= 500}? Number
;
high
: Number
;
Number
: Digit Digit Digit
| Digit Digit
| Digit
;
fragment Digit
: '0'..'9'
;
WhiteSpace
: (' ' | ' ' | '
' | '
') {skip();}
;
If you now parse the string "123, 999, 456, 700, 89, 0"
, you'd see the following output:
low = 123
high = 999
low = 456
high = 700
low = 89
low = 0
这篇关于什么是 ANTLR 中的“语义谓词"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!