如何标记 Perl 源代码? [英] How to tokenize Perl source code?

查看：68 发布时间：2021/6/15 21:03:32 perl tokenize

本文介绍了如何标记 Perl 源代码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些合理的(未混淆的)Perl 源文件，我需要一个标记器，它将其拆分为标记，并返回每个标记的标记类型，例如对于脚本

I have some reasonable (not obfuscated) Perl source files, and I need a tokenizer, which will split it to tokens, and return the token type of each of them, e.g. for the script

print "Hello, World!\n";

它会返回如下内容:

关键字 5 个字节
空格 1 个字节
双引号字符串 17 字节
分号 1 个字节
空格 1 个字节

哪个是最好的库(最好用 Perl 编写)?它必须相当正确，即它应该能够解析像 qq{{\}}} 这样的句法结构，但它不必知道像 Lingua::Romana::Perligata.我知道解析 Perl 是图灵完备的，只有 Perl 本身可以做对，但我不需要绝对的正确性:在一些非常罕见的极端情况下，标记器可能会失败或不兼容或假设一些默认值，但它应该可以工作大多数时候都是正确的.它一定比普通文本编辑器内置的语法高亮更好.

Which is the best library (preferably written in Perl) for this? It has to be reasonably correct, i.e. it should be able to parse syntactic constructs like qq{{\}}}, but it doesn't have to know about special parsers like Lingua::Romana::Perligata. I know that parsing Perl is Turing-complete, and only Perl itself can do it right, but I don't need absolute correctness: the tokenizer can fail or be incompatible or assume some default in some very rare corner cases, but it should work correctly most of the time. It must be better than the syntax highlighting built into an average text editor.

仅供参考，我在 pygments 中尝试了 PerlLexer，它对大多数结构都有效，只是它找不到第二个print 关键字:

FYI I tried the PerlLexer in pygments, which works reasonable for most constructs, except that it cannot find the 2nd print keyword in this one:

print length(<<"END"); print "\n";
String
END

如何标记 Perl 源代码? [英] How to tokenize Perl source code?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何标记 Perl 源代码? [英] How to tokenize Perl source code?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭