我该如何输入此词? [英] How do I lex this input?
问题描述
我目前有使用ANTLR以Java实现的一种简单语言.我想做的就是将其嵌入纯文本中,类似于PHP.
I currently have a working, simple language implemented in Java using ANTLR. What I want to do is embed it in plain text, in a similar fashion to PHP.
例如:
Lorem ipsum dolor sit amet
<% print('consectetur adipiscing elit'); %>
Phasellus volutpat dignissim sapien.
我希望最终的令牌流看起来像这样:
I anticipate that the resulting token stream would look something like:
CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA
我该如何实现?还是有更好的方法?
How can I achieve this, or is there a better way?
对<%
块之外的内容没有任何限制.根据迈克尔·莫罗切克的回答,我认为像<% print('%>'); %>
这样的事情是可能的,但是在这种情况之外,<%
总是表示代码块的开始.
There is no restriction on what might be outside the <%
block. I assumed something like <% print('%>'); %>
, as per Michael Mrozek's answer, would be possible, but outside of a situation like that, <%
would always indicate the start of a code block.
我根据Michael Mrozek的答案提出了一个解决方案,并使用ANTLR的门控语义谓词模拟了Flex的启动条件:
I developed a solution based on ideas given in Michael Mrozek's answer, simulating Flex's start conditions using ANTLR's gated semantic predicates:
lexer grammar Lexer;
@members {
boolean codeMode = false;
}
OPEN : {!codeMode}?=> '<%' { codeMode = true; } ;
CLOSE : {codeMode}?=> '%>' { codeMode = false;} ;
LPAREN : {codeMode}?=> '(';
//etc.
CHAR : {!codeMode}?=> ~('<%');
parser grammar Parser;
options {
tokenVocab = Lexer;
output = AST;
}
tokens {
VERBATIM;
}
program :
(code | verbatim)+
;
code :
OPEN statement+ CLOSE -> statement+
;
verbatim :
CHAR -> ^(VERBATIM CHAR)
;
推荐答案
实际的概念看起来不错,尽管您不太可能拥有PRINT令牌.词法分析器可能会发出类似IDENTIFIER的信息,而解析器将负责弄清它是一个函数调用(例如,通过查找IDENTIFIER OPAREN ... CPAREN
)并做适当的事情.
The actual concept looks fine, although it's unlikely you'd have a PRINT token; the lexer would probably emit something like IDENTIFIER, and the parser would be responsible for figuring out that it's a function call (e.g. by looking for IDENTIFIER OPAREN ... CPAREN
) and doing the appropriate thing.
至于如何做,我对ANTLR一无所知,但它可能有类似flex的
As for how to do it, I don't know anything about ANTLR, but it probably has something like flex's start conditions. If so, you can have the INITIAL
start condition do nothing but look for <%
, which would switch to the CODE
state where all the actual tokens are defined; then '%>' would switch back. In flex it would be:
%s CODE
%%
<INITIAL>{
"<%" {BEGIN(CODE);}
. {}
}
/* All these are implicitly in CODE because it was declared %s,
but you could wrap it in <CODE>{} too
*/
"%>" {BEGIN(INITIAL);}
"(" {return OPAREN;}
"'" {return APOS;}
...
您需要注意诸如在字符串中不是结束标记的情况下匹配%>
之类的事情;是否要允许<% print('%>'); %>
由您决定,但最有可能的是
You need to be careful about things like matching %>
in a context where it's not a closing marker, like within a string; it's up to you if you want to allow <% print('%>'); %>
, but most likely you do
这篇关于我该如何输入此词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!