如何修改与ANTLR的CommonTokenStream中的令牌文本? [英] How can I modify the text of tokens in a CommonTokenStream with ANTLR?
问题描述
我试图学习ANTLR,同时将其用于当前的项目。
我已经到了可以运行lexer对一块代码并将其输出到CommonTokenStream。这是工作正常,我已经验证源文本被分解为适当的令牌。
现在,我想能够修改文本
例如我试过:
import org.antlr.runtime。*;
import java.util。*;
public class LexerTest
{
public static final int IDENTIFIER_TYPE = 4;
public static void main(String [] args)
{
String input =public static void main(String [] args){int myVar = 0;};
CharStream cs = new ANTLRStringStream(input);
JavaLexer lexer = new JavaLexer(cs);
CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
int size = tokens.size();
for(int i = 0; i {
Token token =(Token)tokens.get(i);
if(token.getType()== IDENTIFIER_TYPE)
{
token.setText(V);
}
}
System.out.println(tokens.toString());
}
}
我尝试将所有标识符的文本设置为
-
为什么我调用tokens.toString()时对标记文本的更改不会反映出来, ?
-
我如何知道各种令牌类型ID?我通过我的调试器,看到IDENTIFIER标记的ID是4(因此我的常数在顶部)。但我怎么知道,否则?是否有其他方法将令牌类型ids映射到令牌名称?
编辑:
对我很重要的一件事是我希望令牌有他们原来的开始和结束字符位置。也就是说,我不希望他们反映他们的新位置变量名称更改为V。这是我知道令牌在原始源文本中的位置。
ANTLR有一个方法来做到这一点的语法文件。
假设您正在解析由逗号分隔的数字和字符串组成的字符串。语法如下:
语法Foo;
parse
:value(','value)* EOF
;
value
:Number
| String
;
String
:'''(〜(''|'\\')|'\\\\'|'\\' )*''
;
Number
:'0'..'9'+
;
空格
:(''|'\t'){skip();}
;
这应该都是你熟悉的。假设您要在所有整数值周围括起方括号。方法如下:
语法Foo;
options {output = template; rewrite = true;}
parse
:value(','value)* EOF
;
value
:n = Number - > template(num = {$ n.text})[< num>]
| String
;
String
:'''(〜(''|'\\')|'\\\\'|'\\' )*''
;
Number
:'0'..'9'+
;
空格
:(''|'\t'){skip();}
;
如您所见,我添加了一些选项
在 Number
之后添加一个重写规则( - >
value
解析器规则。
现在测试一下,编译并运行这个类:
import org.antlr.runtime。*;
pre>
public class FooTest {
public static void main(String [] args)throws Exception {
String text =12,\34 \ 56, \\a \\\b \,78;
System.out.println(parsing:+ text);
ANTLRStringStream in = new ANTLRStringStream(text);
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new TokenRewriteStream(lexer); //注意:一个TokenRewriteStream!
FooParser parser = new FooParser(tokens);
parser .parse();
System.out.println(tokens:+ tokens.toString());
}
}
它产生:
解析:12,34 ,56,a \b,78
tokens:[12],34,[56],a \ b,[78]
I'm trying to learn ANTLR and at the same time use it for a current project.
I've gotten to the point where I can run the lexer on a chunk of code and output it to a CommonTokenStream. This is working fine, and I've verified that the source text is being broken up into the appropriate tokens.
Now, I would like to be able to modify the text of certain tokens in this stream, and display the now modified source code.
For example I've tried:
import org.antlr.runtime.*; import java.util.*; public class LexerTest { public static final int IDENTIFIER_TYPE = 4; public static void main(String[] args) { String input = "public static void main(String[] args) { int myVar = 0; }"; CharStream cs = new ANTLRStringStream(input); JavaLexer lexer = new JavaLexer(cs); CommonTokenStream tokens = new CommonTokenStream(); tokens.setTokenSource(lexer); int size = tokens.size(); for(int i = 0; i < size; i++) { Token token = (Token) tokens.get(i); if(token.getType() == IDENTIFIER_TYPE) { token.setText("V"); } } System.out.println(tokens.toString()); } }
I'm trying to set all Identifier token's text to the string literal "V".
Why are my changes to the token's text not reflected when I call tokens.toString()?
How am I suppose to know the various Token Type IDs? I walked through with my debugger and saw that the ID for the IDENTIFIER tokens was "4" (hence my constant at the top). But how would I have known that otherwise? Is there some other way of mapping token type ids to the token name?
EDIT:
One thing that is important to me is I wish for the tokens to have their original start and end character positions. That is, I don't want them to reflect their new positions with the variable names changed to "V". This is so I know where the tokens were in the original source text.
解决方案ANTLR has a way to do this in it's grammar file.
Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this:
grammar Foo; parse : value ( ',' value )* EOF ; value : Number | String ; String : '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"' ; Number : '0'..'9'+ ; Space : ( ' ' | '\t' ) {skip();} ;
This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that:
grammar Foo; options {output=template; rewrite=true;} parse : value ( ',' value )* EOF ; value : n=Number -> template(num={$n.text}) "[<num>]" | String ; String : '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"' ; Number : '0'..'9'+ ; Space : ( ' ' | '\t' ) {skip();} ;
As you see, I've added some
options
at the top, and added a rewrite rule (everything after the->
) after theNumber
in thevalue
parser rule.Now to test it all, compile and run this class:
import org.antlr.runtime.*; public class FooTest { public static void main(String[] args) throws Exception { String text = "12, \"34\", 56, \"a\\\"b\", 78"; System.out.println("parsing: "+text); ANTLRStringStream in = new ANTLRStringStream(text); FooLexer lexer = new FooLexer(in); CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream! FooParser parser = new FooParser(tokens); parser.parse(); System.out.println("tokens: "+tokens.toString()); } }
which produces:
parsing: 12, "34", 56, "a\"b", 78 tokens: [12],"34",[56],"a\"b",[78]
这篇关于如何修改与ANTLR的CommonTokenStream中的令牌文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!