我如何对这个输入进行词法分析? [英] How do I lex this input?

查看：21 发布时间：2021/11/11 3:44:19 antlr lexical-analysis

本文介绍了我如何对这个输入进行词法分析?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前有一种使用 ANTLR 在 Java 中实现的简单有效的语言.我想要做的是将它嵌入到纯文本中，以类似于 PHP 的方式.

例如:

Lorem ipsum dolor sat amet<% print('consectetur adipiscing elit');％＞Phasellus volutpat dignissim sapien.

我预计生成的令牌流将类似于:

CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA

我怎样才能做到这一点，或者有更好的方法吗?

对于 <% 块之外的内容没有限制.我假设了类似 <% print('%>');%>，根据 Michael Mrozek 的回答，是可能的，但在这种情况之外，<% 将始终指示代码块的开始.

<小时>

示例实现

我根据 Michael Mrozek 的回答中给出的想法开发了一个解决方案，使用 ANTLR 的门控语义谓词模拟 Flex 的启动条件:

lexer 语法 Lexer;@会员{布尔代码模式 = 假；}打开:{!codeMode}?=>'<%' { codeMode = true;};关闭:{codeMode}?=>'%>'{ codeMode = false;} ;LPAREN : {codeMode}?=>'(';//等等.字符:{!codeMode}?=>~('<%');解析器语法解析器；选项 {tokenVocab = 词法分析器；输出 = AST;}令牌{逐字;}程序 :(代码 | 逐字逐句)+;代码 :OPEN 语句 + CLOSE ->声明+;逐字逐句:字符 ->^(逐字字符);

解决方案

实际概念看起来不错，尽管您不太可能拥有 PRINT 令牌；词法分析器可能会发出类似 IDENTIFIER 的信息，解析器将负责确定它是一个函数调用(例如，通过查找 IDENTIFIER OPAREN ... CPAREN)并执行适当的操作.>

至于怎么做，我对ANTLR一无所知，但它可能有类似flex的开始条件.如果是这样，您可以让 INITIAL 开始条件只查找 <%，这将切换到 CODE 状态，其中所有定义了实际的令牌；然后 '%>' 会切换回来.在 flex 中，它将是:

%s CODE%%<初始>{<%"{开始(代码)；}.{}}/* 所有这些都隐含在 CODE 中，因为它被声明为 %s，但您也可以将其包装在 {} 中*/%>"{开始(初始)；}"(" {返回 OPAREN;}"'" {返回 APOS;}...

您需要注意诸如在不是结束标记的上下文中匹配 %> 之类的事情，例如在字符串中；如果你想允许 <% print('%>');%>，但很可能你会这样做

I currently have a working, simple language implemented in Java using ANTLR. What I want to do is embed it in plain text, in a similar fashion to PHP.



For example:
Lorem ipsum dolor sit amet
<% print('consectetur adipiscing elit'); %>
Phasellus volutpat dignissim sapien.
I anticipate that the resulting token stream would look something like:
CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA
How can I achieve this, or is there a better way?

There is no restriction on what might be outside the <% block. I assumed something like <% print('%>'); %>, as per Michael Mrozek's answer, would be possible, but outside of a situation like that, <% would always indicate the start of a code block.



Sample Implementation

I developed a solution based on ideas given in Michael Mrozek's answer, simulating Flex's start conditions using ANTLR's gated semantic predicates:
lexer grammar Lexer;

@members {
    boolean codeMode = false;
}

OPEN    : {!codeMode}?=> '<%' { codeMode = true; } ;
CLOSE   : {codeMode}?=> '%>' { codeMode = false;} ;
LPAREN  : {codeMode}?=> '(';
//etc.

CHAR    : {!codeMode}?=> ~('<%');


parser grammar Parser;

options {
    tokenVocab = Lexer;
    output = AST;
}

tokens {
    VERBATIM;
}

program :
    (code | verbatim)+
    ;   

code :
    OPEN statement+ CLOSE -> statement+
    ;

verbatim :
    CHAR -> ^(VERBATIM CHAR)
    ;

 解决方案 
The actual concept looks fine, although it's unlikely you'd have a PRINT token; the lexer would probably emit something like IDENTIFIER, and the parser would be responsible for figuring out that it's a function call (e.g. by looking for IDENTIFIER OPAREN ... CPAREN) and doing the appropriate thing.

As for how to do it, I don't know anything about ANTLR, but it probably has something like flex's start conditions. If so, you can have the INITIAL start condition do nothing but look for <%, which would switch to the CODE state where all the actual tokens are defined; then '%>' would switch back. In flex it would be:
%s CODE

%%

<INITIAL>{
    "<%"      {BEGIN(CODE);}
    .         {}
}

 /* All these are implicitly in CODE because it was declared %s,
    but you could wrap it in <CODE>{} too
  */
"%>"          {BEGIN(INITIAL);}
"("           {return OPAREN;}
"'"           {return APOS;}
...
You need to be careful about things like matching %> in a context where it's not a closing marker, like within a string; it's up to you if you want to allow <% print('%>'); %>, but most likely you do

                        这篇关于我如何对这个输入进行词法分析?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

我如何对这个输入进行词法分析? [英] How do I lex this input?

问题描述

示例实现

Sample Implementation

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何对这个输入进行词法分析? [英] How do I lex this input?

问题描述

示例实现

Sample Implementation

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭