如何使用antlr在两个终端规则中以不同的方式解析特殊字符? [英] How can I parse a special character differently in two terminal rules using antlr?

查看：114 发布时间：2020/5/25 1:35:23 java parsing antlr grammar

本文介绍了如何使用antlr在两个终端规则中以不同的方式解析特殊字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个语法，它在许多终止规则的开头使用$字符，例如$video{，$audio{，$image{，$link{等.

I have a grammar that uses the $ character at the start of many terminal rules, such as $video{, $audio{, $image{, $link{ and others that are like this.

但是，我还要匹配所有不符合这些规则的$和{和}字符.例如，我的语法与 CHUNK 规则中的$100不正确匹配，但是将$添加到 CHUNK 的一长串可接受字符中会导致其他产生打破规则.

However, I'd also like to match all the $ and { and } characters that don't match these rules too. For example, my grammar does not properly match $100 in the CHUNK rule, but adding the $ to the long list of acceptable characters in CHUNK causes the other production rules to break.

如何更改语法，使其足够聪明，可以将特殊的$，{和}字符与特殊生产规则区分开?

How can I change my grammar so that it's smart enough to distinguish normal $, { and } characters from my special production rules?

基本上我要说的是，如果$字符后面没有{，视频，图像，音频，链接等，则应该转到CHUNK".

Basically what I'd to be able to do is say, "if the $ character doesn't have {, video, image, audio, link, etc. after it, then it should go to CHUNK".

grammar Text;

@header {
}

@lexer::members {
    private boolean readLabel = false;
    private boolean readUrl = false;
}

@members {
    private int numberOfVideos = 0;
    private int numberOfAudios = 0;
    private StringBuilder builder = new StringBuilder();

    public String getResult() {
        return builder.toString();
    }
}

text
    :   expression*
    ;

expression
    :   fillInTheBlank 
        {
            builder.append($fillInTheBlank.value);
        }
    |   image 
        {
            builder.append($image.value);
        }
    |   video
        {
            builder.append($video.value);
        }
    |   audio
        {
            builder.append($audio.value);
        }
    |   link
        {
            builder.append($link.value);
        }
    |   everythingElse
        {
            builder.append($everythingElse.value);
        }
    ;

fillInTheBlank returns [String value]
    :   BEGIN_INPUT LABEL END_COMMAND
        {
            $value = "<input type=\"text\" id=\"" +
                $LABEL.text +
                "\" name=\"" + 
                $LABEL.text +
                "\" class=\"FillInTheBlankAnswer\" />";
        }
    ;

image returns [String value]
    :   BEGIN_IMAGE URL END_COMMAND
        {
            $value = "<img src=\"" + $URL.text + "\" />";
        }
    ;

video returns [String value]
    :   BEGIN_VIDEO URL END_COMMAND
        {
            numberOfVideos++;

            StringBuilder b = new StringBuilder();
            b.append("<div id=\"video1\">Loading the player ...</div>\r\n");
            b.append("<script type=\"text/javascript\">\r\n");
            b.append("\tjwplayer(\"video" + numberOfVideos + "\").setup({\r\n");
            b.append("\t\tflashplayer: \"/trainingdividend/js/jwplayer/player.swf\", file: \"");
            b.append($URL.text);
            b.append("\"\r\n\t});\r\n");
            b.append("</script>\r\n");

            $value = b.toString();
        }
    ;

audio returns [String value]
    :   BEGIN_AUDIO URL END_COMMAND
        {
            numberOfAudios++;

            StringBuilder b = new StringBuilder();
            b.append("<p id=\"audioplayer_");
            b.append(numberOfAudios);
            b.append("\">Alternative content</p>\r\n");
            b.append("<script type=\"text/javascript\">\r\n");
            b.append("\tAudioPlayer.embed(\"audioplayer_");
            b.append(numberOfAudios);
            b.append("\", {soundFile: \"");
            b.append($URL.text);
            b.append("\"});\r\n");
            b.append("</script>\r\n");

            $value = b.toString();
        }
    ;   

link returns [String value]
    :   BEGIN_LINK URL END_COMMAND
        {
            $value = "<a href=\"" + $URL.text + "\">" + $URL.text + "</a>";
        }
    ;   

everythingElse returns [String value]
    :   CHUNK
        {
            $value = $CHUNK.text;
        }
    ;

BEGIN_INPUT
    :   '${' 
        { 
            readLabel = true; 
        }
    ;

BEGIN_IMAGE
    :   '$image{' 
        { 
            readUrl = true; 
        }
    ;

BEGIN_VIDEO
    :   '$video{' 
        { 
            readUrl = true; 
        }
    ;

BEGIN_AUDIO
    :   '$audio{' 
        { 
            readUrl = true; 
        }
    ;

BEGIN_LINK
    :   '$link{' 
        { 
            readUrl = true; 
        }
    ;

END_COMMAND
    :   { readLabel || readUrl }?=> '}' 
        { 
            readLabel = false; 
            readUrl = false;
        }
    ;

URL
    :   { readUrl }?=> 'http://' ('a'..'z'|'A'..'Z'|'0'..'9'|'.'|'/'|'-'|'_'|'%'|'&'|'?'|':')+
    ;

LABEL
    :   { readLabel }?=> ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*
    ;

CHUNK
    //: (~('${'|'$video{'|'$image{'|'$audio{'))+
    :   ('a'..'z'|'A'..'Z'|'0'..'9'|' '|'\t'|'\n'|'\r'|'-'|','|'.'|'?'|'\''|':'|'\"'|'>'|'<'|'/'|'_'|'='|';'|'('|')'|'&'|'!'|'#'|'%'|'*')+
    ;

推荐答案

您最多只能取一个字符.因此，以下内容无效:

You can't negate more than a single character. So, the following is invalid:

~('${')

但是为什么不直接在CHUNK规则中添加'$'，'{'和'}'，并在CHUNK规则末尾删除+(否则，将其删除)会在源头进一步吞噬更多(可能是'$video{')，就像您已经注意到自己一样)?

But why not simply add '$', '{' and '}' to your CHUNK rule and remove the + at the end of the CHUNK rule (otherwise it would gobble up to much, possibly '$video{' further in the source, as you have noticed yourself already)?.

现在CHUNK令牌将始终包含一个字符，但是您可以创建生产规则来解决此问题:

Now a CHUNK token will always consist of a single character, but you could create a production rule to fix this:

chunk : CHUNK+ ;

，并在生产规则中使用chunk而不是CHUNK(当然也可以使用CHUNK+).

and use chunk in your production rules instead of CHUNK (or use CHUNK+, of course).

像"{ } $foo $video{"这样的输入将被标记为以下内容:

Input like "{ } $foo $video{" would be tokenized as follows:

CHUNK { CHUNK CHUNK } CHUNK CHUNK $ CHUNK f CHUNK o CHUNK o CHUNK BEGIN_VIDEO $video{

编辑

如果让解析器输出AST，则可以轻松地将一个或多个CHUNK匹配的所有文本合并到一个内部令牌为CHUNK类型的AST中，如下所示:

EDIT

And if you let your parser output an AST, you can easily merge all the text that one or more CHUNK's match into a single AST, whose inner token is of type CHUNK, like this:

grammar Text; options { output=AST; } ... chunk : CHUNK+ -> {new CommonTree(new CommonToken(CHUNK, $text))} ; ...

这篇关于如何使用antlr在两个终端规则中以不同的方式解析特殊字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用antlr在两个终端规则中以不同的方式解析特殊字符? [英] How can I parse a special character differently in two terminal rules using antlr?

问题描述

推荐答案

编辑

EDIT

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何使用antlr在两个终端规则中以不同的方式解析特殊字符? [英] How can I parse a special character differently in two terminal rules using antlr?

问题描述

推荐答案

编辑

EDIT

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭