Antlr:如何匹配其他已识别令牌之间的所有内容? [英] Antlr: how to match everything between the other recognized tokens?

查看:15
本文介绍了Antlr:如何匹配其他已识别令牌之间的所有内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何匹配词法分析器中其他标记之间的所有剩余文本?

How do I match all of the leftover text between the other tokens in my lexer?

这是我的代码:

grammar UserQuery;

expr:  expr AND expr
    | expr OR expr
    | NOT expr
    | TEXT+
    | '(' expr ')'
    ;

OR  :    'OR';
AND :    'AND';
NOT :    'NOT';
LPAREN : '(';
RPAREN : ')';

TEXT: .+?;

当我在xx AND yy"上运行词法分析器时,我得到这些标记:

When I run the lexer on "xx AND yy", I get these tokens:

x type:TEXT
x type:TEXT
  type:TEXT
AND type:'AND'
  type:TEXT
y type:TEXT
y type:TEXT

这种工作,除了我不希望每个字符都是一个标记.我想将所有剩余的文本合并为一个 TEXT 标记.

This sort-of works, except that I don't want each character to be a token. I'd like to consolidate all of the leftover text into a single TEXT token.

推荐答案

我认为没有分隔符是不可能的,否则贪婪的 (?) 词法分析器标记将匹配所有您的输入,包括您的显式标记,原则上最长匹配使用词法分析器标记获胜.

I don't think this is possible without a delimiter, otherwise the greedy (?) lexer token will match all your input, including your explicit tokens, on the principle that longest match wins with lexer tokens.

现在,如果您可以接受需要使用分隔符来描述文本,并添加一个简单的空格规则来处理它们之间的空格,那么您会得到如下结果:

Now, if you can accept that a delimiter is needed to delineate the text, and the addition of a simple whitespace rule to handle the spaces in between, then you get something like this:

[@0,0:14=''longest token'',<TEXT>,1:0]
[@1,16:18='AND',<'AND'>,1:16]
[@2,20:23=''yy'',<TEXT>,1:20]
[@3,24:23='<EOF>',<EOF>,1:24]

从这个语法:

grammar UserQuery;

expr:  expr AND expr
    | expr OR expr
    | NOT expr
    | TEXT
    | '(' expr ')'
    ;

OR  :    'OR';
AND :    'AND';
NOT :    'NOT';
LPAREN : '(';
RPAREN : ')';

TEXT : '\'' .*? '\'';
WS: [ \t\r\n] -> skip;

使用此输入:

'longest token' AND 'yy'

这与编程语言中通常处理注释和字符串的方式非常相似,其中有一个开始和结束分隔符,两者之间的所有内容都被标记为一个大标记.通常对于注释,我们会丢弃它们,但在这里我们将它们保留为字符串.希望这会有所帮助.

It's very similar to the way comments and strings are often handled in programming languages, where there's a starting and ending delimiter and everything in between is tokenized as one big token. Often with comments we'd discard them, but here we keep them as we would a string. Hope this helps.

这篇关于Antlr:如何匹配其他已识别令牌之间的所有内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆