解析模板语言 [英] Parsing a templating language

查看：124 发布时间：2020/9/2 23:14:59 antlr antlr3

本文介绍了解析模板语言的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试解析一种模板语言，而我在正确解析可能出现在标签之间的任意html时遇到了麻烦.到目前为止，我在下面有什么建议?

I'm trying to parse a templating language and I'm having trouble correctly parsing the arbitrary html that can appear between tags. So far what I have is below, any suggestions? An example of a valid input would be

{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}This Should Be Parsed as a Buffer.{/bar2}

语法是:

grammar g;

options {
  language=Java;
  output=AST;
  ASTLabelType=CommonTree;
}

/* LEXER RULES */
tokens {

}

LD  :    '{';
RD  :    '}';
LOOP    :    '#';  
END_LOOP:   '/';
PARTIAL :   '>';
fragment DIGIT  : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*;
BUFFER options {greedy=false;} : ~(LD | RD)+ ;

/* PARSER RULES */
start   : body EOF
;

body    : (tag | loop | partial | BUFFER)*
;

tag     : LD! IDENT^ RD!
;

loop    : LD! LOOP^ IDENT RD!
  body
  LD! END_LOOP! IDENT RD!
;

 partial : LD! PARTIAL^ IDENT RD!
;

buffer  : BUFFER 
;

推荐答案

您的词法分析器独立于解析器进行标记化.如果您的解析器尝试匹配BUFFER令牌，则词法分析器不会将此信息考虑在内.对于您这样的输入，例如:"blah blah blah"，词法分析器将创建3个IDENT令牌，而不是单个BUFFER令牌.

Your lexer tokenizes independently from your parser. If your parser tries to match a BUFFER token, the lexer does not take this info into account. In your case with input like: "blah blah blah", the lexer creates 3 IDENT tokens, not a single BUFFER token.

您需要告诉"词法分析器的是，当您位于标签内部(即遇到LD标签)时，应创建IDENT令牌，而当您位于标签外部时(即您遇到了RD标记)，则应创建一个BUFFER令牌而不是一个IDENT令牌.

What you need to "tell" your lexer is that when you're inside a tag (i.e. you encountered a LD tag), a IDENT token should be created, and when you're outside a tag (i.e. you encountered a RD tag), a BUFFER token should be created instead of an IDENT token.

要实现此目的，您需要:

In order to implement this, you need to:

在词法分析器内创建一个boolean标志，以跟踪您在标签内还是标签外的事实.这可以在语法的@lexer::members { ... }部分内完成；

LD

RD

boolean

@after{ ... }

在词法分析器内部创建BUFFER令牌之前，请检查当前是否在标记之外.这可以通过使用 语义谓词在您的词法分析器规则的开头.

create a boolean flag inside the lexer that keeps track of the fact that you're in- or outside a tag. This can be done inside the @lexer::members { ... } section of your grammar;

after the lexer either creates a LD- or RD-token, flip the boolean flag from (1). This can be done in the @after{ ... } section of the lexer rules;

before creating a BUFFER token inside the lexer, check if you're outside a tag at the moment. This can be done by using a semantic predicate at the start of your lexer rule.

简短演示:

grammar g; options { output=AST; ASTLabelType=CommonTree; } @lexer::members { private boolean insideTag = false; } start : body EOF -> body ; body : (tag | loop | partial | BUFFER)* ; tag : LD IDENT RD -> IDENT ; loop : LD LOOP IDENT RD body LD END_LOOP IDENT RD -> ^(LOOP body IDENT IDENT) ; partial : LD PARTIAL IDENT RD -> ^(PARTIAL IDENT) ; LD @after{insideTag=true;} : '{'; RD @after{insideTag=false;} : '}'; LOOP : '#'; END_LOOP : '/'; PARTIAL : '>'; SPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}; IDENT : (LETTER | '_') (LETTER | '_' | DIGIT)*; BUFFER : {!insideTag}?=> ~(LD | RD)+; fragment DIGIT : '0'..'9'; fragment LETTER : ('a'..'z' | 'A'..'Z');

(请注意，您可能要舍弃标记之间的空格，因此我添加了SPACE规则并舍弃了这些空格)

(note that you probably want to discard spaces between tag, so I added a SPACE rule and discarded these spaces)

使用以下类对其进行测试:

Test it with the following class:

import org.antlr.runtime.*; import org.antlr.runtime.tree.*; import org.antlr.stringtemplate.*; public class Main { public static void main(String[] args) throws Exception { String src = "{foo}{#bar}blah blah blah{zed}{/bar}{>foo2}{#bar2}" + "This Should Be Parsed as a Buffer.{/bar2}"; gLexer lexer = new gLexer(new ANTLRStringStream(src)); gParser parser = new gParser(new CommonTokenStream(lexer)); CommonTree tree = (CommonTree)parser.start().getTree(); DOTTreeGenerator gen = new DOTTreeGenerator(); StringTemplate st = gen.toDOT(tree); System.out.println(st); } }

并在运行主类后:

java -cp antlr-3.3.jar org.antlr.Tool g.g javac -cp antlr-3.3.jar *.java java -cp .:antlr-3.3.jar Main

Windows

java -cp antlr-3.3.jar org.antlr.Tool g.g javac -cp antlr-3.3.jar *.java java -cp .;antlr-3.3.jar Main

您将看到一些DOT源正在打印到控制台，它对应于以下AST:

You'll see some DOT-source being printed to the console, which corresponds to the following AST:

(使用 graphviz-dev.appspot.com 创建的图像)

这篇关于解析模板语言的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析模板语言 [英] Parsing a templating language

问题描述

推荐答案

Windows

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

解析模板语言 [英] Parsing a templating language

问题描述

推荐答案

Windows

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭