我可以在运行时添加Antlr令牌吗? [英] Can I add Antlr tokens at runtime?

查看:78
本文介绍了我可以在运行时添加Antlr令牌吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的情况是我的语言包含一些在构建时未知的词,但在运行时会被理解,这导致需要不断重建/重新部署程序以考虑新词.我在徘徊,是否可以在Antlr中从配置文件中生成一些令牌?

I have a situation where my language contains some words that aren't known at build time but will be known at run time causing the need to constantly rebuild / redeploy the program to take into account new words. I was wandering if it was possible in Antlr generate some of the tokens from a config file?

例如,在一个简化的示例中,如果我有一条规则

e.g In a simplified example if I have a rule

rule : WORDS+;

WORDS : 'abc';

我的语言在运行时遇到'bcd',我希望能够修改配置文件以将bcd定义为一个单词,而不必重新构建然后重新部署.

And my language comes across 'bcd' at runntime, I would like to be able to modify a config file to define bcd as a word rather than having to rebuild then redeploy.

推荐答案

您可以在lexer类中添加某种集合.该集合将保存所有运行时字.然后,在规则内添加一些可能与这些运行时单词匹配的自定义代码,并更改令牌的类型(如果令牌存在于集合中).

You could add some sort of collection to your lexer class. This collection will hold all runtime-words. Then you add some custom code inside the rule that could possibly match these runtime-words and change the type of the token if it is present in the collection.

假设您要解析输入:

"foo bar baz"

,并且在运行时,单词"foo""baz"应该成为特殊的运行时单词.以下语法显示了如何解决此问题:

and at runtime, the words "foo" and "baz" should become special runtime words. The following grammar shows how to solve this:

grammar RuntimeWords;

tokens {
  RUNTIME_WORD;
}

@lexer::members {

  private java.util.Set<String> runtimeWords;

  public RuntimeWordsLexer(CharStream input, java.util.Set<String> words) {
    super(input);
    runtimeWords = words;
  }
}

parse
  :  (w=. {System.out.printf("\%-15s :: \%s \n", tokenNames[$w.type], $w.text);})+ EOF
  ;

Word
  :  ('a'..'z' | 'A'..'Z')+
     {
       if(runtimeWords.contains(getText())) {
         $type = RUNTIME_WORD;
       }
     }
  ;

Space
  :  ' ' {skip();}
  ;

还有一个小测试班:

import org.antlr.runtime.*;
import java.util.*;

public class Main {
  public static void main(String[] args) throws Exception {
    Set<String> words = new HashSet<String>(Arrays.asList("foo", "baz"));
    ANTLRStringStream in = new ANTLRStringStream("foo bar baz");
    RuntimeWordsLexer lexer = new RuntimeWordsLexer(in, words);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    RuntimeWordsParser parser = new RuntimeWordsParser(tokens);        
    parser.parse();
  }
}

将产生以下输出:

RUNTIME_WORD    :: foo 
Word            :: bar 
RUNTIME_WORD    :: baz

Demo II

这是另一个更适合您的问题的演示(我起初跳过得很快,但我会保留我的第一个演示,因为它可能对某些人有用).其中没有太多评论,但我想您将不会对掌握发生的情况有任何疑问(如果没有,请不要犹豫,要求澄清!).

Demo II

Here's another demo that is more tailored to your problem (I skimmed your question too quickly at first, but I'll leave my first demo in place because it might come in handy for someone). There's not much comments in it, but my guess is that you won't have problems grasping what happens (if not, don't hesitate to ask for clarification!).

grammar RuntimeWords;

@lexer::members {

  private java.util.Set<String> runtimeWords;

  public RuntimeWordsLexer(CharStream input, java.util.Set<String> words) {
    super(input);
    runtimeWords = words;
  }

  private boolean runtimeWordAhead() {
    for(String word : runtimeWords) {
      if(ahead(word)) {
        return true;
      }
    }
    return false;
  }

  private boolean ahead(String word) {
    for(int i = 0; i < word.length(); i++) {
      if(input.LA(i+1) != word.charAt(i)) {
        return false;
      }
    } 
    return true; 
  }
}

parse
  :  (w=. {System.out.printf("\%-15s :: \%s \n", tokenNames[$w.type], $w.text);})+ EOF
  ;

Word
  :  {runtimeWordAhead()}?=> ('a'..'z' | 'A'..'Z')+
  |  'abc'
  ;

Space
  :  ' ' {skip();}
  ;

和班级:

import org.antlr.runtime.*;
import java.util.*;

public class Main {
  public static void main(String[] args) throws Exception {
    Set<String> words = new HashSet<String>(Arrays.asList("BBB", "CDEFG"));
    ANTLRStringStream in = new ANTLRStringStream("BBB abc CDEFG");
    RuntimeWordsLexer lexer = new RuntimeWordsLexer(in, words);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    RuntimeWordsParser parser = new RuntimeWordsParser(tokens);        
    parser.parse();
  }
}

将产生:

Word            :: BBB 
Word            :: abc 
Word            :: CDEFG 

请注意,如果您的某些运行时单词以另一个单词开头.例如,如果您的运行时单词包含"stack""stacker",则希望先检查较长的单词!根据字符串的长度对集合进行排序.

Be careful if some of your runtime words start with another one. For example, if your runtime words contain "stack" and "stacker", you want the longer word to be checked first! Sorting the set based on the length of the strings should be in order.

最后一个警告:如果只有"stack"在运行时单词列表中,而词法分析器遇到"stacker",则您可能不想创建"stack"-令牌并使"er"悬空.在这种情况下,您需要检查word中最后一个字符之后的字符是否不是 一个字母:

One final word of caution: if only "stack" is in your runtime word list and the lexer encounters "stacker", then you probably don't want to create a "stack"-token and leave "er" dangling. In that case, you'll want to check if the character after the last char in the word is not a letter:

private boolean ahead(String word) {
  for(int i = 0; i < word.length(); i++) {
    if(input.LA(i+1) != word.charAt(i)) {
      return false;
    }
  }
  // charAfterWord = input.LA(word.length())
  // assert charAfterWord != letter
  // note that charAfterWord could also be EOF
  return ... ; 
}

这篇关于我可以在运行时添加Antlr令牌吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆