在 ANTLR 中解析字符串插值 [英] Parsing string interpolation in ANTLR

查看:21
本文介绍了在 ANTLR 中解析字符串插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个用于内部目的的简单字符串操作 DSL,我希望该语言支持字符串插值,因为它在 Ruby 中使用.

I'm working on a simple string manipulation DSL for internal purposes, and I would like the language to support string interpolation as it is used in Ruby.

例如:

name = "Bob"
msg = "Hello ${name}!"
print(msg)   # prints "Hello Bob!"

我正在尝试在 ANTLRv3 中实现我的解析器,但我对使用 ANTLR 非常缺乏经验,因此我不确定如何实现此功能.到目前为止,我已经在词法分析器中指定了我的字符串文字,但在这种情况下,我显然需要在解析器中处理插值内容.

I'm attempting to implement my parser in ANTLRv3, but I'm pretty inexperienced with using ANTLR so I'm unsure how to implement this feature. So far, I've specified my string literals in the lexer, but in this case I'll obviously need to handle the interpolation content in the parser.

我当前的字符串文字语法如下所示:

My current string literal grammar looks like this:

STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ;
fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ;

将字符串文字处理移动到解析器中似乎会使其他一切停止正常工作.粗略的网络搜索没有产生任何信息.关于如何开始这方面的任何建议?

Moving the string literal handling into the parser seems to make everything else stop working as it should. Cursory web searches didn't yield any information. Any suggestions as to how to get started on this?

推荐答案

我不是 ANTLR 专家,但这里有一个可能的语法:

I'm no ANTLR expert, but here's a possible grammar:

grammar Str;

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' (Identifier | stringLiteral) ')' 
    ;

assignment
    :    Identifier (Space)* '=' (Space)* stringLiteral
    ;

stringLiteral
    :    '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
    ;

Interpolation
    :    '${' Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

如您所见,示例语法中有几个 (Space)*-es.这是因为 stringLiteral 是一个 parser-rule 而不是 lexer-rule.因此,在标记源文件时,词法分析器无法知道空格是字符串文字的一部分,还是只是源文件中可以忽略的空格.

As you notice, there are a couple of (Space)*-es inside the example grammar. This is because the stringLiteral is a parser-rule instead of a lexer-rule. Therefor, when tokenizing the source file, the lexer cannot know if a white space is part of a string literal, or is just a space inside the source file that can be ignored.

我用一个小 Java 类测试了这个例子,一切都按预期工作:

I tested the example with a little Java class and all worked as expected:

/* the same grammar, but now with a bit of Java code in it */
grammar Str;

@parser::header {
    package antlrdemo;
    import java.util.HashMap;
}

@lexer::header {
    package antlrdemo;
}

@parser::members {
    HashMap<String, String> vars = new HashMap<String, String>();
}

parse
    :    ((Space)* statement (Space)* ';')+ (Space)* EOF
    ;

statement
    :    print | assignment
    ;

print
    :    'print' '(' 
         (    id=Identifier    {System.out.println("> "+vars.get($id.text));} 
         |    st=stringLiteral {System.out.println("> "+$st.value);}
         ) 
         ')' 
    ;

assignment
    :    id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
    ;

stringLiteral returns [String value]
    :    '"'
        {StringBuilder b = new StringBuilder();} 
        (    id=Identifier           {b.append($id.text);}
        |    es=EscapeSequence       {b.append($es.text);}
        |    ch=(NormalChar | Space) {b.append($ch.text);}
        |    in=Interpolation        {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
        )* 
        '"'
        {$value = b.toString();}
    ;

Interpolation
    :    '${' i=Identifier '}'
    ;

Identifier
    :    ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
    ;

EscapeSequence
    :    '\\' SpecialChar
    ;

SpecialChar
    :     '"' | '\\' | '$'
    ;

Space
    :    (' ' | '\t' | '\r' | '\n')
    ;

NormalChar
    :    ~SpecialChar
    ;

还有一个带有 main 方法的类来测试它:

And a class with a main method to test it all:

package antlrdemo;

import org.antlr.runtime.*;

public class ANTLRDemo {
    public static void main(String[] args) throws RecognitionException {
        String source = "name = \"Bob\";        \n"+
                "msg = \"Hello ${name}\";       \n"+
                "print(msg);                    \n"+
                "print(\"Bye \\${for} now!\");    ";
        ANTLRStringStream in = new ANTLRStringStream(source);
        StrLexer lexer = new StrLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        StrParser parser = new StrParser(tokens);
        parser.parse();
    }
}

产生以下输出:

> Hello Bob
> Bye \${for} now!

再说一次,我不是专家,但这(至少)为您提供了一种解决方法.

Again, I am no expert, but this (at least) gives you a way to solve it.

HTH.

这篇关于在 ANTLR 中解析字符串插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆