在ANTLR中解析字符串插值 [英] Parsing string interpolation in ANTLR

查看:78
本文介绍了在ANTLR中解析字符串插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为内部目的而开发一个简单的字符串处理DSL,并且我希望该语言支持Ruby中使用的字符串插值.

I'm working on a simple string manipulation DSL for internal purposes, and I would like the language to support string interpolation as it is used in Ruby.

例如:

name = "Bob"
msg = "Hello ${name}!"
print(msg)   # prints "Hello Bob!"

我正在尝试在ANTLRv3中实现我的解析器,但是我对使用ANTLR缺乏经验,因此我不确定如何实现此功能.到目前为止,我已经在词法分析器中指定了字符串文字,但是在这种情况下,我显然需要处理解析器中的插值内容.

I'm attempting to implement my parser in ANTLRv3, but I'm pretty inexperienced with using ANTLR so I'm unsure how to implement this feature. So far, I've specified my string literals in the lexer, but in this case I'll obviously need to handle the interpolation content in the parser.

我当前的字符串文字语法看起来像这样:

My current string literal grammar looks like this:

STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ;
fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ;

将字符串文字处理移入解析器似乎会使其他所有内容停止正常工作.游标网络搜索未产生任何信息.关于如何开始这一点的任何建议?

Moving the string literal handling into the parser seems to make everything else stop working as it should. Cursory web searches didn't yield any information. Any suggestions as to how to get started on this?

推荐答案

我不是ANTLR专家,但这是一种可能的语法:

I'm no ANTLR expert, but here's a possible grammar:

grammar Str;

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' (Identifier | stringLiteral) ')' 
    ;

assignment
    :    Identifier (Space)* '=' (Space)* stringLiteral
    ;

stringLiteral
    :    '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
    ;

Interpolation
    :    '${' Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

您注意到,示例语法中有几个(Space)* -es.这是因为stringLiteral parser-rule 而不是 lexer-rule .因此,对源文件进行标记时,词法分析器无法知道空格是字符串文字的一部分,还是只是源文件中可以忽略的空格.

As you notice, there are a couple of (Space)*-es inside the example grammar. This is because the stringLiteral is a parser-rule instead of a lexer-rule. Therefor, when tokenizing the source file, the lexer cannot know if a white space is part of a string literal, or is just a space inside the source file that can be ignored.

我使用一个小的Java类测试了该示例,并且所有示例均按预期工作:

I tested the example with a little Java class and all worked as expected:

/* the same grammar, but now with a bit of Java code in it */
grammar Str;

@parser::header {
    package antlrdemo;
    import java.util.HashMap;
}

@lexer::header {
    package antlrdemo;
}

@parser::members {
    HashMap<String, String> vars = new HashMap<String, String>();
}

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' 
         (    id=Identifier    {System.out.println("> "+vars.get($id.text));} 
         |    st=stringLiteral {System.out.println("> "+$st.value);}
         ) 
         ')' 
    ;

assignment
    :    id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
    ;

stringLiteral returns [String value]
    :    '"'
        {StringBuilder b = new StringBuilder();} 
        (    id=Identifier           {b.append($id.text);}
        |    es=EscapeSequence       {b.append($es.text);}
        |    ch=(NormalChar | Space) {b.append($ch.text);}
        |    in=Interpolation        {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
        )* 
        '"'
        {$value = b.toString();}
    ;

Interpolation
    :    '${' i=Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

还有一个具有主要方法来测试所有内容的类:

And a class with a main method to test it all:

package antlrdemo;

import org.antlr.runtime.*;

public class ANTLRDemo {
    public static void main(String[] args) throws RecognitionException {
        String source = "name = \"Bob\";        \n"+
                "msg = \"Hello ${name}\";       \n"+
                "print(msg);                    \n"+
                "print(\"Bye \\${for} now!\");    ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        StrLexer lexer = new StrLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        StrParser parser = new StrParser(tokens);
        parser.parse();
    }
}

产生以下输出:

> Hello Bob
> Bye \${for} now!

再次,我不是专家,但这(至少)为您提供了一种解决方法.

Again, I am no expert, but this (at least) gives you a way to solve it.

HTH.

这篇关于在ANTLR中解析字符串插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆