关于如何为Pygments编写词法分析器的大量文档? [英] Extensive documentation on how to write a lexer for Pygments?

查看:194
本文介绍了关于如何为Pygments编写词法分析器的大量文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Stata 关键字的字典,并且对Stata语法有一定的了解.我想花几个小时将其转换为 Pygments 的Stata词法分析器.

I have a dictionary of Stata keywords and reasonable knowledge of Stata syntax. I would like to devote a few hours to turn it into a Stata lexer for Pygments.

但是,我找不到足够的有关词法分析器语法的文档,并且发现自己无法开始对词法分析器进行编码. 有人可以指出一个很好的教程为Pygments编写新的词法分析器吗?

However, I cannot find enough documentation about the syntax of lexers and find myself unable to start coding the lexer. Could someone point out a good tutorial for writing new lexers for Pygments?

我了解 Pygments API

I know about the Pygments API and the lexer development page, but honestly, these are not enough for someone like me with very limited knowledge of Python.

到目前为止,我的策略一直是寻找示例.我发现不少,例如人偶Sass Scala Ada .他们只是帮了大忙.欢迎从我的Stata关键字开始使用的任何帮助.

My strategy so far has been to look for examples. I have found quite a few, e.g. Puppet, Sass, Scala, Ada. They helped only that much. Any help with how to get started from my Stata keywords would be welcome.

推荐答案

如果您只想突出显示关键字,则从此开始(用您自己的Stata关键字列表替换关键字):

If you just wanted to highlight the keywords, you'd start with this (replacing the keywords with your own list of Stata keywords):

class StataLexer(RegexLexer):

    name = 'Stata'
    aliases = ['stata']
    filenames = '*.stata'
    flags = re.MULTILINE | re.DOTALL

    tokens = {
       'root': [
           (r'(abstract|case|catch|class|do|else|extends|false|final|'
            r'finally|for|forSome|if|implicit|import|lazy|match|new|null|'
            r'object|override|package|private|protected|requires|return|'
            r'sealed|super|this|throw|trait|try|true|type|while|with|'
            r'yield)\b', Keyword),
       ],
   }

我认为您的问题不是您不了解任何Python,而是您没有太多的编写词法分析器或了解词法分析器工作原理的经验?因为此实现非常简单.

I think your problem is not that you don't know any Python, but that you don't have much experience with writing a lexer or understanding how a lexer works? Because this implementation is fairly straightforward.

然后,如果要添加更多内容,请在root列表中添加一个额外的元素,即两个元素的元组,其中第一个元素是正则表达式,第二个元素指定语法类.

Then, if you want to add more stuff, add an extra element to the root list, a two-element tuple, where the first element is a regular expression and the second element designates a syntactic class.

这篇关于关于如何为Pygments编写词法分析器的大量文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆