ANTLR4的HTML/Markdown样式语法 [英] HTML/Markdown style grammar for ANTLR4

查看:326
本文介绍了ANTLR4的HTML/Markdown样式语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为要转换为AST的文档定义HTML/Markdown之类的语法.我知道,ANTLR4并不是做Markdown事情的最佳工具,但我更接近HTML方向.至少我认为我是. :)

I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :)

这是我的词法分析器定义:

Here's my lexer definition:

lexer grammar dnpMDLexer;

NL
    : [\r\n]
    ;

HEAD_TAG
    : '#'
    ;

HEADING_TEXT
    : ('\\#'|~[#`\r\n])+
    ;

ITALIC_TAG
    : '*'
    ;

ITALIC_TEXT
    : ('\\*'|~[#`*\r\n]).+?
    ;

LISTING_TAG
    : '`'
    ;

RUNNING_TEXT
    : ('\\#'|'\\`'|'\\*'|~[#*`])+
    ;

这是我的解析器定义:

parser grammar dnpMDParser;

options { tokenVocab=dnpMDLexer; }

dnpMD
    : subheadline headline lead body
    ;

subheadline
    : HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG NL
    ;

headline
    : HEAD_TAG HEADING_TEXT HEAD_TAG NL
    ;

lead
    : HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG
    ;

subheading
    : HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
    ;

listing
     : LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG .+? LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG
     ;

italic
    : ITALIC_TAG ITALIC_TEXT ITALIC_TAG
    ;

body
    : RUNNING_TEXT body
    | subheading body
    | listing body
    | italic body
    | EOF
    ;

我使用ANTLR4插件在ANTLRworks2和IntelliJ中尝试了这些东西.

I tried this stuff in ANTLRworks2 and IntelliJ with the ANTLR4 plugin.

我在列表和斜体规则方面遇到了很多问题.在某些情况下匹配的方式很多,而在其他情况下则没有.在上述版本中,斜体样式无效.

I've heavy problems with the listing and the italic rule. Matching way to much in some cases and nothing in other. In the above version the italic style do not work.

我要朝正确的方向前进吗?我试图将HTML语法用作模板.不太确定ANTLR4模式是否可以帮助我区分标签的外部文本和内部文本?

Am I heading in the right direction? I tried to use the HTML grammar as a template. Not quite sure if the ANTLR4 modes could help me to distinguish between outer text and inner text of tags?

也许有人有一些有用的提示.我很高兴能收到我的每条提示,因为我不是100%地确定我解决这个问题的方式是否会引导我朝正确的方向前进.

Maybe someone has some useful hints. I'm thankful for every hint I can get because I'm not 100% sure that the way I'm working on this problem will lead me towards the right direction.

这是ANTLRworks2中TestRig的图像.第二个斜体规则是匹配方式.

Here's an image of the TestRig within ANTLRworks2. The second italic rule is matching way to much.

谢谢, 法比安

推荐答案

当前解决方案类似于以下词法和语法规则:

The current solution looks like this lexer and grammar rules:

lexer grammar dnpMDAuslagernLexer;

/*@members {
    public static final int COMMENTS = 1;
}*/

NL
    : [\r\n]
    ;

SUBHEADLINE
    : '##' (~[\r\n])+? '##'
    ;

HEADLINE
    : '#' ('\\#'|~[\r\n])+? '#'
    ;

LEAD
    : '###' (~[\r\n])+? '###'
    ;

SUBHEADING
    : '####' (~[\r\n])+? '####'
    ;

CAPTION
    : '#####' (~[\r\n])+? '#####'
    ;

LISTING
    : '~~~~~' .+? '~~~~~'
    ;

ELEMENTPATH
    : '[[[[[' (~[\r\n])+? ']]]]]'
    ;

LABELREF
    : '{##' (~[\r\n])+? '##}'
    ;

LABEL
    : '{#' (~[\r\n])+? '#}'
    ;

ITALIC
    : '*' (~[\r\n])+? '*'
    ;

SINGLE_COMMENT
    : '//' (~[\r\n])+ -> channel(1)
    ;

MULTI_COMMENT
    : '/*' .*? '*/' -> channel(1)
    ;

STAR
    : '*'
    ;

BRACE_OPEN
    : '{'
    ;

TEXT
    : (~[\r\n*{])+
    ;

parser grammar dnpMDAuslagernParser;

options { tokenVocab=dnpMDAuslagernLexer; }

dnpMD
    : head body
    ;

head
    : subheadline headline lead
    ;

subheadline
    : SUBHEADLINE NL+
    ;

headline
    : HEADLINE NL+
    ;

lead
    : LEAD
    ;

subheading
    : SUBHEADING
    ;

caption
    : CAPTION
    ;

listing
    : LISTING (NL listingPath)? (NL label)? NL caption
    ;

image
    : caption (NL label)? (NL imagePath)?
    ;

listingPath
    : ELEMENTPATH
    ;

imagePath
    : ELEMENTPATH
    ;

labelRef
    : LABELREF
    ;

label
    : LABEL
    ;

italic
    : ITALIC
    ;

singleComment
    : SINGLE_COMMENT
    ;

multiComment
    : MULTI_COMMENT
    ;

paragraph
    : TEXT? italic TEXT?
    | TEXT? STAR TEXT?
    | TEXT? labelRef TEXT?
    | TEXT? BRACE_OPEN TEXT?
    | TEXT? LABEL TEXT?
    | ELEMENTPATH
    | TEXT
    ;

newlines
    : NL+
    ;

body
    : bodyElements+
    ;

bodyElements
    : singleComment
    | multiComment
    | paragraph
    | subheading
    | listing
    | image
    | newlines
    ;

这种语言很好用,也许有人可以从中受益.

This language is working fine and maybe someone can benefit from it.

感谢所有提供帮助的人! 法比安

Thanks to all who helped out! Fabian

这篇关于ANTLR4的HTML/Markdown样式语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆