ANTLR 3 C 中的令牌返回值 [英] Token return values in ANTLR 3 C
问题描述
我是 ANTLR 的新手,我正在尝试使用 C 语言目标 (antler3C) 编写一个简单的解析器.语法很简单,我希望每个规则都返回一个值,例如:
I'm new to ANTLR, and I'm attempting to write a simple parser using C language target (antler3C). The grammar is simple enough that I'd like to have each rule return a value, eg:
number returns [long value]
:
( INT {$value = $INT.ivalue;}
| HEX {$value = $HEX.hvalue;}
)
;
HEX returns [long hvalue]
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+ {$hvalue = strtol((char*)$text->chars,NULL,16);}
;
INT returns [long ivalue]
: '0'..'9'+ {$ivalue = strtol((char*)$text->chars,NULL,10);}
;
每条规则都收集其子规则的返回值,直到最顶层的规则返回一个包含我的数据的漂亮结构.
Each rule collects the return value of it's child rules until the topmost rule returns a nice struct full of my data.
据我所知,ANTLR 允许词法分析器规则(标记,例如INT"和HEX")像解析器规则(例如数字")一样返回值.但是,生成的 C 代码不会编译:
As far as I can tell, ANTLR allows lexer rules (tokens, eg 'INT' & 'HEX') to return values just like parser rules (eg 'number'). However, the generated C code will not compile:
error C2228: left of '.ivalue' must have class/struct/union
error C2228: left of '.hvalue' must have class/struct/union
我做了一些探索,错误是有道理的 - 令牌最终成为通用的 ANTLR3_COMMON_TOKEN_struct,它不允许返回值.因此,也许 C 目标只是不支持此功能.但就像我说的,我是新手,在我开始寻找另一种方法之前,我想确认我不能这样做.
I did some poking around, and the errors make sense - the tokens end up as generic ANTLR3_COMMON_TOKEN_struct, which doesn't allow for a return value. So maybe the C target just doesn't support this feature. But like I said, I'm new to this, and before I go haring off to find another approach I want to confirm that I can't do it this way.
所以问题是:'antler3C 是否支持词法分析器规则的返回值,如果支持,使用它们的正确方法是什么?'
So the question is this: 'Does antler3C support return values for lexer rules, and if so what is the proper way to use them?'
推荐答案
实际上并不是什么新信息,只是关于@bemace 已经提到的一些细节.
Not really any new information, just some details on what @bemace already mentioned.
不,词法分析器规则不能有返回值.请参阅权威ANTLR参考中的4.3规则:
No, lexer rules cannot have return values. See 4.3 Rules from The Definitive ANTLR reference:
就像函数调用一样,ANTLR解析器和树解析器规则可以有参数和返回值.ANTLR 词法分析器规则不能有返回价值观 [...]
Rule Arguments and Return Values
Just like function calls, ANTLR parser and tree parser rules can have arguments and return values. ANTLR lexer rules cannot have return values [...]
<小时>
有两种选择:
There are two options:
您可以在解析器规则number
中转换为long
:
You can do the transforming to a long
in the parser rule number
:
number returns [long value]
: INT {$value = Long.parseLong($INT.text);}
| HEX {$value = Long.parseLong($HEX.text.substring(2), 16);}
;
选项 2
或者创建您自己的令牌,例如具有 toLong(): long
方法:
import org.antlr.runtime.*;
public class YourToken extends CommonToken {
public YourToken(CharStream input, int type, int channel, int start, int stop) {
super(input, type, channel, start, stop);
}
// your custom method
public long toLong() {
String text = super.getText();
int radix = text.startsWith("0x") ? 16 : 10;
if(radix == 16) text = text.substring(2);
return Long.parseLong(text, radix);
}
}
并在语法中的 options {...}
标头中定义以使用此标记并覆盖词法分析器类中的 emit(): Token
方法:
and define in the options {...}
header in your grammar to use this token and override the emit(): Token
method in your lexer class:
grammar Foo;
options{
TokenLabelType=YourToken;
}
@lexer::members {
public Token emit() {
YourToken t = new YourToken(input, state.type, state.channel,
state.tokenStartCharIndex, getCharIndex()-1);
t.setLine(state.tokenStartLine);
t.setText(state.text);
t.setCharPositionInLine(state.tokenStartCharPositionInLine);
emit(t);
return t;
}
}
parse
: number {System.out.println("parsed: "+$number.value);} EOF
;
number returns [long value]
: INT {$value = $INT.toLong();}
| HEX {$value = $HEX.toLong();}
;
HEX
: '0' 'x' ('0'..'9'|'a'..'f'|'A'..'F')+
;
INT
: '0'..'9'+
;
当您生成解析器和词法分析器并运行此测试类时:
When you generate a parser and lexer, and run this test class:
import org.antlr.runtime.*;
import java.io.*;
public class Main {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("0xCafE");
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
FooParser parser = new FooParser(tokens);
parser.parse();
}
}
它将产生以下输出:
parsed: 51966
第一个选项在您的情况下似乎更实用.
The first options seems the more practical in your case.
请注意,如您所见,给出的示例是用 Java 编写的.我不知道 C 目标/运行时是否支持选项 2.我决定仍然发布它,以便能够在 SO 上将其用作未来的参考.
Note that, as you can see, the examples given are in Java. I have no idea if option 2 is supported in the C target/runtime. I decided to still post it to be able to use it as a future reference here on SO.
这篇关于ANTLR 3 C 中的令牌返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!