我该如何输入此词? [英] How do I lex this input?

查看:62
本文介绍了我该如何输入此词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有使用ANTLR以Java实现的一种简单语言.我想做的就是将其嵌入纯文本中,类似于PHP.

I currently have a working, simple language implemented in Java using ANTLR. What I want to do is embed it in plain text, in a similar fashion to PHP.

例如:

Lorem ipsum dolor sit amet
<% print('consectetur adipiscing elit'); %>
Phasellus volutpat dignissim sapien.

我希望最终的令牌流看起来像这样:

I anticipate that the resulting token stream would look something like:

CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA

我该如何实现?还是有更好的方法?

How can I achieve this, or is there a better way?

<%块之外的内容没有任何限制.根据迈克尔·莫罗切克的回答,我认为像<% print('%>'); %>这样的事情是可能的,但是在这种情况之外,<%总是表示代码块的开始.

There is no restriction on what might be outside the <% block. I assumed something like <% print('%>'); %>, as per Michael Mrozek's answer, would be possible, but outside of a situation like that, <% would always indicate the start of a code block.

我根据Michael Mrozek的答案提出了一个解决方案,并使用ANTLR的门控语义谓词模拟了Flex的启动条件:

I developed a solution based on ideas given in Michael Mrozek's answer, simulating Flex's start conditions using ANTLR's gated semantic predicates:

lexer grammar Lexer;

@members {
    boolean codeMode = false;
}

OPEN    : {!codeMode}?=> '<%' { codeMode = true; } ;
CLOSE   : {codeMode}?=> '%>' { codeMode = false;} ;
LPAREN  : {codeMode}?=> '(';
//etc.

CHAR    : {!codeMode}?=> ~('<%');


parser grammar Parser;

options {
    tokenVocab = Lexer;
    output = AST;
}

tokens {
    VERBATIM;
}

program :
    (code | verbatim)+
    ;   

code :
    OPEN statement+ CLOSE -> statement+
    ;

verbatim :
    CHAR -> ^(VERBATIM CHAR)
    ;

推荐答案

实际的概念看起来不错,尽管您不太可能拥有PRINT令牌.词法分析器可能会发出类似IDENTIFIER的信息,而解析器将负责弄清它是一个函数调用(例如,通过查找IDENTIFIER OPAREN ... CPAREN)并做适当的事情.

The actual concept looks fine, although it's unlikely you'd have a PRINT token; the lexer would probably emit something like IDENTIFIER, and the parser would be responsible for figuring out that it's a function call (e.g. by looking for IDENTIFIER OPAREN ... CPAREN) and doing the appropriate thing.

至于如何做,我对ANTLR一无所知,但它可能有类似flex的

As for how to do it, I don't know anything about ANTLR, but it probably has something like flex's start conditions. If so, you can have the INITIAL start condition do nothing but look for <%, which would switch to the CODE state where all the actual tokens are defined; then '%>' would switch back. In flex it would be:

%s CODE

%%

<INITIAL>{
    "<%"      {BEGIN(CODE);}
    .         {}
}

 /* All these are implicitly in CODE because it was declared %s,
    but you could wrap it in <CODE>{} too
  */
"%>"          {BEGIN(INITIAL);}
"("           {return OPAREN;}
"'"           {return APOS;}
...

您需要注意诸如在字符串中不是结束标记的情况下匹配%>之类的事情;是否要允许<% print('%>'); %>由您决定,但最有可能的是

You need to be careful about things like matching %> in a context where it's not a closing marker, like within a string; it's up to you if you want to allow <% print('%>'); %>, but most likely you do

这篇关于我该如何输入此词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆