ANTLR 嵌套函数 [英] ANTLR Nested Functions

查看:40
本文介绍了ANTLR 嵌套函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR 适合这个项目吗?

我希望处理和转换用户输入的字符串,其中可能包含自定义函数.例如,用户可能会在字符串中写入类似 $CAPITALIZE('word') 的内容,而我想执行使用 StringUtils 在后台进行实际转换.

我想用户有时会编写嵌套函数,例如:

$RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'),6),3,'0')

其中预期的输出是A12345000"的字符串值.

我尝试使用正则表达式将函数分开,但是一旦嵌套,就不是那么容易了.我想我可能会尝试编写自己的解析器,在进行研究时,我发现了一篇建议使用 ANTLR 的文章.

这是 ANTLR 适合的东西吗?如果是这样,是否有任何类似的示例可供我查看?或者有人愿意给我一个例子,说明我如何在 ANTLR 中写出它,以便我可以同时拥有可以单独处理和以嵌套方式处理的自定义函数.

功能:

  • $CAPITALIZE(String str)
  • $INDEX_OF(String seq, String searchSeq)
  • $LEFT(String str, int len)
  • $LEFT_PAD(String str, int size,char padChar)
  • $LOWERCASE(String str)
  • $RIGHT(String str, int len)
  • $RIGHT_PAD(String str, int size, char padChar)
  • $STRIP(String str)
  • $STRIP_ACCENTS(字符串输入)
  • $SUBSTRING(String str, int start)
  • $SUBSTRING(String str, int start, int end)
  • $TRIM(String str)
  • $TRUNCATE(String str, int maxWidth)
  • $UPPERCASE(String str)

基本示例:

  • $CAPITALIZE('word') → 'Word'
  • $INDEX_OF('word', 'r') → 2
  • $LEFT('0123456789',6) → '012345'
  • $LEFT_PAD('0123456789',3, '0') → '0000123456789'
  • $LOWERCASE('WoRd') → 'word'
  • $RIGHT('0123456789',6) → '456789'
  • $RIGHT_PAD('0123456789',3, '0') → '0123456789000'
  • $STRIP('单词')→'单词'
  • $STRIP_ACCENTS('wórd') → 'word'
  • $SUBSTRING('word', 1) → 'ord'
  • $SUBSTRING('word', 0, 2) → 'wor'
  • $TRIM('word') → 'word'
  • $TRUNCATE('更多单词', 3) → '更多'
  • $UPPERCASE('word') → 'WORD'

嵌套示例

  • $LEFT_PAD($LEFT('123456789',6),3,'0') → '000123456'
  • $RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'),6),3,'0') → 'A12345000'

实际示例:我所说的实际示例的意思是,这就是我期望的字符串值可能的样子.你会注意到有像 ${var} 这样写的变量.在将字符串传递到 ANTLR 之前,这些变量将使用 Apache Commons StringSubstitutor 替换为实际的字符串值(如果我应该使用它)

用户写入的初始字符串\HomeDir\Students\$RIGHT(${graduation.year},2)\$LEFT_PAD($LEFT(${state.id},6),3,'0')

StringSubstitutor 处理后的字符串\HomeDir\Students\$RIGHT('2020',2)\$LEFT_PAD($LEFT('123456789',6),3,'0')

经过ANTLR处理后的字符串 (以及我的最终输出)

\HomeDir\Students\20\000123456

ANTLR 似乎是我应该在这个项目中使用的东西,还是其他更适合的东西?

解决方案

是的,ANTLR 将是一个不错的选择.请记住,ANTLR 只为您进行解析,并为您提供遍历生成的解析树的机制.您必须编写代码来计算表达式.

在您的情况下,您的词法分析器在偶然发现 '$' 时需要通过将词法状态推送为功能模式"来触发.当它看到一个 ')' 时,一个这样的in-a-function-mode"应该从词法堆栈中弹出.

在 ANTLR wiki 上阅读所有关于词法模式/堆栈的信息:

Is ANTLR right for this project?

I'm looking to process and transform a string entered in by a user which may include custom functions. For example, the user might write something like $CAPITALIZE('word') in a string and I want to perform the actual transformation in the background using StringUtils.

I would imagine the users will sometimes write nested functions like:

$RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'),6),3,'0')

Where the expected output would be a string value of 'A12345000'.

I tried using regex to split the functions apart, but once nested, it wasn't so easy. I figured I might try writing my own parser, and while doing research I came across an article that suggested using ANTLR instead.

Is this something ANTLR would be right for? If so, are there any similar examples already available for me to look at? Or would someone be kind enough to give me an example of how I might write this out in ANTLR so that I can have both custom functions that can be processable individually and in a nested fashion.

Functions:

  • $CAPITALIZE(String str)
  • $INDEX_OF(String seq, String searchSeq)
  • $LEFT(String str, int len)
  • $LEFT_PAD(String str, int size,char padChar)
  • $LOWERCASE(String str)
  • $RIGHT(String str, int len)
  • $RIGHT_PAD(String str, int size, char padChar)
  • $STRIP(String str)
  • $STRIP_ACCENTS(String input)
  • $SUBSTRING(String str, int start)
  • $SUBSTRING(String str, int start, int end)
  • $TRIM(String str)
  • $TRUNCATE(String str, int maxWidth)
  • $UPPERCASE(String str)

Basic Examples:

  • $CAPITALIZE('word') → 'Word'
  • $INDEX_OF('word', 'r') → 2
  • $LEFT('0123456789',6) → '012345'
  • $LEFT_PAD('0123456789',3, '0') → '0000123456789'
  • $LOWERCASE('WoRd') → 'word'
  • $RIGHT('0123456789',6) → '456789'
  • $RIGHT_PAD('0123456789',3, '0') → '0123456789000'
  • $STRIP(' word ') → 'word'
  • $STRIP_ACCENTS('wórd') → 'word'
  • $SUBSTRING('word', 1) → 'ord'
  • $SUBSTRING('word', 0, 2) → 'wor'
  • $TRIM('word ') → 'word'
  • $TRUNCATE('more words', 3) → 'more'
  • $UPPERCASE('word') → 'WORD'

Nested Examples

  • $LEFT_PAD($LEFT('123456789',6),3,'0') → '000123456'
  • $RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'),6),3,'0') → 'A12345000'

Actual Example: What I mean by actual example is that this is what I expect a string value might look like. You will notice that there are variables written like ${var}. These variables will be replaced with actual string values using Apache Commons StringSubstitutor prior to passing the String into ANTLR (if it turns out I should use it)

Initial String Written By User \HomeDir\Students\$RIGHT(${graduation.year},2)\$LEFT_PAD($LEFT(${state.id},6),3,'0')

String After Being Processed By StringSubstitutor \HomeDir\Students\$RIGHT('2020',2)\$LEFT_PAD($LEFT('123456789',6),3,'0')

String After Being Processed By ANTLR (And my final output)

\HomeDir\Students\20\000123456

Does ANTLR seem like something I should use for this project, or would something else be better suited?

解决方案

Yes, ANTLR would be a good choice. Keep in mind that ANTLR only does the parsing for you, and provides you with a mechanism to traverse the generated parse tree. You will have to write code to evaluate the expressions.

In your case, your lexer would need to be triggered when it stumbles upon a '$' by pushing the lexical state as being "in-a-function-mode". And when it sees a ')', one such "in-a-function-mode" should be popped off the lexical stack.

Read all about lexical modes/stack on the ANTLR wiki: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md

Here's a quick demo of how that could work for ANTLR4 (ANTLR3 doesn't support lexical modes):

file: TLexer.g4

lexer grammar TLexer;

TEXT
 : ~[$]
 ;

FUNCTION_START
 : '$' -> pushMode(IN_FUNCTION), skip
 ;

mode IN_FUNCTION;
  FUNTION_NESTED : '$' -> pushMode(IN_FUNCTION), skip;
  ID             : [a-zA-Z_]+;
  PAR_OPEN       : '(';
  PAR_CLOSE      : ')' -> popMode;
  NUMBER         : [0-9]+;
  STRING         : '\'' ( ~'\'' | '\'\'' )* '\'';
  COMMA          : ',';
  SPACE          : [ \t\r\n]-> skip;

file: TParser.g4

parser grammar TParser;

options {
  tokenVocab=TLexer;
}

parse
 : atom* EOF
 ;

atom
 : text
 | function
 ;

text
 : TEXT+
 ;

function
 : ID params
 ;

params
 : PAR_OPEN ( param ( COMMA param )* )? PAR_CLOSE
 ;

param
 : NUMBER
 | STRING
 | function
 ;

With the ANTLR4 plugin from IntelliJ, you can easily test the parse method from the parser and feed it the following input: foo $RIGHT_PAD($RIGHT($CAPITALIZE('a123456789'), 6), 3, '0') bar, which will generate the following image of the parse tree:

这篇关于ANTLR 嵌套函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆