如何标记 Perl 源代码? [英] How to tokenize Perl source code?

查看:68
本文介绍了如何标记 Perl 源代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些合理的(未混淆的)Perl 源文件,我需要一个标记器,它将其拆分为标记,并返回每个标记的标记类型,例如对于脚本

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer, which will split it to tokens, and return the token type of each of them, e.g. for the script

print "Hello, World!\n";

它会返回如下内容:

  • 关键字 5 个字节
  • 空格 1 个字节
  • 双引号字符串 17 字节
  • 分号 1 个字节
  • 空格 1 个字节

哪个是最好的库(最好用 Perl 编写)?它必须相当正确,即它应该能够解析像 qq{{\}}} 这样的句法结构,但它不必知道像 Lingua::Romana::Perligata.我知道解析 Perl 是图灵完备的,只有 Perl 本身可以做对,但我不需要绝对的正确性:在一些非常罕见的极端情况下,标记器可能会失败或不兼容或假设一些默认值,但它应该可以工作大多数时候都是正确的.它一定比普通文本编辑器内置的语法高亮更好.

Which is the best library (preferably written in Perl) for this? It has to be reasonably correct, i.e. it should be able to parse syntactic constructs like qq{{\}}}, but it doesn't have to know about special parsers like Lingua::Romana::Perligata. I know that parsing Perl is Turing-complete, and only Perl itself can do it right, but I don't need absolute correctness: the tokenizer can fail or be incompatible or assume some default in some very rare corner cases, but it should work correctly most of the time. It must be better than the syntax highlighting built into an average text editor.

仅供参考,我在 pygments 中尝试了 PerlLexer,它对大多数结构都有效,只是它找不到第二个print 关键字:

FYI I tried the PerlLexer in pygments, which works reasonable for most constructs, except that it cannot find the 2nd print keyword in this one:

print length(<<"END"); print "\n";
String
END

推荐答案

PPI

这篇关于如何标记 Perl 源代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆