ANTLR嵌套函数 [英] ANTLR Nested Functions

查看:246
本文介绍了ANTLR嵌套函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR是否适合该项目?

我正在寻求处理和转换用户输入的字符串,其中可能包括自定义函数.例如,用户可能在字符串中写了诸如$ CAPITALIZE('word')之类的内容,而我想执行 使用StringUtils在后台进行实际转换.

I'm looking to process and transform a string entered in by a user which may include custom functions. For example, the user might write something like $CAPITALIZE('word') in a string and I want to perform the actual transformation in the background using StringUtils.

我想用户有时会编写嵌套函数,例如:

I would imagine the users will sometimes write nested functions like:

$ RIGHT_PAD($ RIGHT($ CAPITALIZE('a123456789'),6),3,'0')

$RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'),6),3,'0')

期望的输出将是字符串值"A12345000".

Where the expected output would be a string value of 'A12345000'.

我尝试使用正则表达式将功能分开,但是一旦嵌套,就不那么容易了.我认为我可以尝试编写自己的解析器,而在进行研究时,我发现了一篇建议使用ANTLR的文章.

I tried using regex to split the functions apart, but once nested, it wasn't so easy. I figured I might try writing my own parser, and while doing research I came across an article that suggested using ANTLR instead.

这是ANTLR合适的吗?如果是这样,是否有类似的例子可供我参考?还是有人愿意给我一个例子,说明如何在ANTLR中写出这样的例子,这样我就可以拥有两个既可以单独处理又可以嵌套处理的自定义函数.

Is this something ANTLR would be right for? If so, are there any similar examples already available for me to look at? Or would someone be kind enough to give me an example of how I might write this out in ANTLR so that I can have both custom functions that can be processable individually and in a nested fashion.

功能

  • $ CAPITALIZE(String str)
  • $ INDEX_OF(字符串序列,字符串searchSeq)
  • $ LEFT(String str,int len)
  • $ LEFT_PAD(字符串str,int size,char padChar)
  • $ LOWERCASE(字符串str)
  • $ RIGHT(String str,int len)
  • $ RIGHT_PAD(字符串str,int size,char padChar)
  • $ STRIP(String str)
  • $ STRIP_ACCENTS(字符串输入)
  • $ SUBSTRING(String str,int start)
  • $ SUBSTRING(字符串str,int start,int end)
  • $ TRIM(String str)
  • $ TRUNCATE(String str,int maxWidth)
  • $ UPPERCASE(字符串str)
  • $CAPITALIZE(String str)
  • $INDEX_OF(String seq, String searchSeq)
  • $LEFT(String str, int len)
  • $LEFT_PAD(String str, int size,char padChar)
  • $LOWERCASE(String str)
  • $RIGHT(String str, int len)
  • $RIGHT_PAD(String str, int size, char padChar)
  • $STRIP(String str)
  • $STRIP_ACCENTS(String input)
  • $SUBSTRING(String str, int start)
  • $SUBSTRING(String str, int start, int end)
  • $TRIM(String str)
  • $TRUNCATE(String str, int maxWidth)
  • $UPPERCASE(String str)

基本示例:

  • $ CAPITALIZE('word')→'Word'
  • $ INDEX_OF('word','r')→2
  • $ LEFT('0123456789',6)→'012345'
  • $ LEFT_PAD('0123456789',3,'0')→'0000123456789'
  • $ LOWERCASE('WoRd')→'word'
  • $ RIGHT('0123456789',6)→'456789'
  • $ RIGHT_PAD('0123456789',3,'0')→'0123456789000'
  • $ STRIP('word')→'word'
  • $ STRIP_ACCENTS('wórd')→'word'
  • $ SUBSTRING('word',1)→'ord'
  • $ SUBSTRING('word',0,2)→'wor'
  • $ TRIM('word')→'word'
  • $ TRUNCATE('more words',3)→'more'
  • $ UPPERCASE('word')→'WORD'
  • $CAPITALIZE('word') → 'Word'
  • $INDEX_OF('word', 'r') → 2
  • $LEFT('0123456789',6) → '012345'
  • $LEFT_PAD('0123456789',3, '0') → '0000123456789'
  • $LOWERCASE('WoRd') → 'word'
  • $RIGHT('0123456789',6) → '456789'
  • $RIGHT_PAD('0123456789',3, '0') → '0123456789000'
  • $STRIP(' word ') → 'word'
  • $STRIP_ACCENTS('wórd') → 'word'
  • $SUBSTRING('word', 1) → 'ord'
  • $SUBSTRING('word', 0, 2) → 'wor'
  • $TRIM('word ') → 'word'
  • $TRUNCATE('more words', 3) → 'more'
  • $UPPERCASE('word') → 'WORD'

嵌套示例

  • $ LEFT_PAD($ LEFT('123456789',6),3,'0')→'000123456'
  • $ RIGHT_PAD($ RIGHT($ CAPITALIZE('a123456789'),6),3,'0')→'A12345000'

实际示例: 我通过实际示例表示的意思是,这就是我期望的字符串值可能看起来的样子.您会注意到有些变量像$ {var}这样写.在将字符串传递给ANTLR之前,将使用Apache Commons StringSubstitutor将这些变量替换为实际的字符串值(如果事实证明我应该使用它)

Actual Example: What I mean by actual example is that this is what I expect a string value might look like. You will notice that there are variables written like ${var}. These variables will be replaced with actual string values using Apache Commons StringSubstitutor prior to passing the String into ANTLR (if it turns out I should use it)

用户编写的初始字符串 \ HomeDir \ Students \ $ RIGHT($ {graduation.year},2)\ $ LEFT_PAD($ LEFT($ {state.id},6),3,'0')

Initial String Written By User \HomeDir\Students\$RIGHT(${graduation.year},2)\$LEFT_PAD($LEFT(${state.id},6),3,'0')

由StringSubstitutor处理后的字符串 \ HomeDir \ Students \ $ RIGHT('2020',2)\ $ LEFT_PAD($ LEFT('123456789',6),3,'0')

String After Being Processed By StringSubstitutor \HomeDir\Students\$RIGHT('2020',2)\$LEFT_PAD($LEFT('123456789',6),3,'0')

由ANTLR处理后的字符串 (还有我的最终输出)

\ HomeDir \ Students \ 20 \ 000123456

\HomeDir\Students\20\000123456

ANTLR看起来像是我应该在该项目中使用的东西,还是其他更适合的东西?

Does ANTLR seem like something I should use for this project, or would something else be better suited?

推荐答案

是的,ANTLR是一个不错的选择.请记住,ANTLR仅为您执行解析,并为您提供了一种遍历生成的解析树的机制.您将必须编写代码来评估表达式.

Yes, ANTLR would be a good choice. Keep in mind that ANTLR only does the parsing for you, and provides you with a mechanism to traverse the generated parse tree. You will have to write code to evaluate the expressions.

在您的情况下,当词法分析器偶然发现'$'时,需要通过将词法状态推送为处于功能模式"来触发词法分析器.并且当它看到')'时,应该从词法堆栈中弹出一个这样的处于功能模式".

In your case, your lexer would need to be triggered when it stumbles upon a '$' by pushing the lexical state as being "in-a-function-mode". And when it sees a ')', one such "in-a-function-mode" should be popped off the lexical stack.

在ANTLR Wiki上阅读有关词法模式/堆栈的所有信息: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md

Read all about lexical modes/stack on the ANTLR wiki: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md

这里是有关ANTLR4的工作原理的快速演示(ANTLR3不支持词法模式):

Here's a quick demo of how that could work for ANTLR4 (ANTLR3 doesn't support lexical modes):

lexer grammar TLexer;

TEXT
 : ~[$]
 ;

FUNCTION_START
 : '$' -> pushMode(IN_FUNCTION), skip
 ;

mode IN_FUNCTION;
  FUNTION_NESTED : '$' -> pushMode(IN_FUNCTION), skip;
  ID             : [a-zA-Z_]+;
  PAR_OPEN       : '(';
  PAR_CLOSE      : ')' -> popMode;
  NUMBER         : [0-9]+;
  STRING         : '\'' ( ~'\'' | '\'\'' )* '\'';
  COMMA          : ',';
  SPACE          : [ \t\r\n]-> skip;

文件:TParser.g4

parser grammar TParser;

options {
  tokenVocab=TLexer;
}

parse
 : atom* EOF
 ;

atom
 : text
 | function
 ;

text
 : TEXT+
 ;

function
 : ID params
 ;

params
 : PAR_OPEN ( param ( COMMA param )* )? PAR_CLOSE
 ;

param
 : NUMBER
 | STRING
 | function
 ;

使用IntelliJ的ANTLR4插件,您可以轻松地从解析器测试parse方法并将其输入以下输入:foo $RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'), 6), 3, '0') bar,这将生成以下解析树图像:

With the ANTLR4 plugin from IntelliJ, you can easily test the parse method from the parser and feed it the following input: foo $RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'), 6), 3, '0') bar, which will generate the following image of the parse tree:

这篇关于ANTLR嵌套函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆