ANTLR4 标记大量关键字 [英] ANTLR4 Tokenizing a Huge Set of Keywords

查看:34
本文介绍了ANTLR4 标记大量关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一些已知的标识符名称嵌入到我的语法中,例如我的项目的类名是已知的,我想告诉词法分析器哪些标识符是实际上属于类名标记的已知关键字.但是由于我有很长的类名列表(数百个名称),我不想通过列出规则中所有已知的类名关键字来创建类名词法分析器规则,这会使我的语法文件太大.

I want to embed some known identifier names into my grammar e.g. the class names of my project are known and I want to tell the lexer what identifiers are known keywords that actually belongs to the class-name token. But since I have a long list of class names (hundreds of names), I don't want to create a class-name lexer rule by listing all the known class name keywords in the rule, that will make my grammar file too large.

是否可以将我的关键字放入单独的文件中?我正在考虑的一种可能性是将关键字放在一个 Java 类中,该类将由生成的词法分析器类进行子类化.在这种情况下,我的词法分析器的语义谓词可以调用自定义词法分析器超类中的方法来验证输入标记是否与我的一长串名称匹配.我的长列表可以放在那个超类 src 代码中.

Is it possible to place my keywords into a separate file? One possibility I am thinking about is to place the keywords in a java class that will be subclassed by the generated lexer class. In that case, my lexer's semantic predicate can just call a method in custom lexer superclass to verify if the input token matches my long list of names. And my long list can be placed inside that superclass src code.

但是,在 ANTLR4 书中,它说组合语法的语法选项 'superClass' 仅设置解析器的超类.如果我仍然想使用组合语法,我该如何设置我的词法分析器的超类.或者有没有其他更好的方法可以将我的一长串关键字放入一个单独的关键字文件"中.

However, in the ANTLR4 book it says grammar options 'superClass' for combined grammar only set the parser's superclass. How can I set my lexer's superclass if I still want to use combined grammar. Or is there any other better method to put my long list of keywords into a separate "keyword file".

推荐答案

如果你想让每个关键字都有自己的token类型,你可以这样做:

If you want each keyword to have its own token type, you can do the following:

  1. tokens{} 块添加到语法中,以便为每个关键字创建标记.这可确保为您的每个关键字创建独特的令牌类型.

  1. Add a tokens{} block to the grammar to create tokens for each keyword. This ensures unique token types are created for each of your keywords.

tokens {
    Keyword1,
    Keyword2,
    ...
}

  • 创建一个类似于以下内容的单独类 MyLanguageKeywords:

    private static final Map<String, Integer> KEYWORDS =
        new HashMap<String, Integer>();
    static {
        KEYWORDS.put("keyword1", MyLanguageParser.Keyword1);
        KEYWORDS.put("keyword2", MyLanguageParser.Keyword2);
        ...
    }
    
    public static int getKeywordOrIdentifierType(String text) {
         Integer type = KEYWORDS.get(text);
         if (type == null) {
             return MyLanguageParser.Identifier;
         }
    
         return type;
    }
    

  • Identifier 词法分析器规则添加到处理关键字和标识符的语法中.

  • Add an Identifier lexer rule to your grammar that handles keywords and identifiers.

    Identifier
        :   [a-zA-Z_] [a-zA-Z0-9_]*
            {_type = MyLanguageKeywords.getKeywordOrIdentifierType(getText());}
        ;
    

  • 这篇关于ANTLR4 标记大量关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆