如何在 Perl 中有效地匹配许多不同的正则表达式模式? [英] How can I efficiently match many different regex patterns in Perl?

查看：31 发布时间：2021/6/15 20:04:55 regex perl

本文介绍了如何在 Perl 中有效地匹配许多不同的正则表达式模式?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个不断增长的正则表达式列表，我用它们来解析日志文件，搜索有趣的"错误和调试语句.我目前将它们分成 5 个桶，其中大部分分为 3 个大桶.到目前为止，我有 140 多种模式，而且这个列表还在不断增加.

I have a growing list of regular expressions that I am using to parse through log files searching for "interesting" error and debug statements. I'm currently breaking them into 5 buckets, with most of them falling into 3 large buckets. I have over 140 of patterns so far, and the list is continuing to grow.

大多数正则表达式都很简单，但它们也相当独特，所以我用单一模式捕获多个匹配项的机会很少.由于我匹配的内容的性质，模式往往是模糊的，因此很少匹配，所以我在每个输入行上做了大量的工作，最终结果是它无法匹配任何东西，或匹配最后的通用之一.

Most of the regular expressions are simple, but they're also fairly unique, so my opportunities to catch multiple matches with a single pattern are few and far between. Because of the nature of what I'm matching, the patterns tend to be obscure and therefor seldom matched against, so I'm doing a TON of work on each input line with the end result being that it fails to match anything, or matches one of the generic ones at the very end.

而且由于输入的数量(数百兆字节的日志文件)，我有时需要等待一两分钟才能完成脚本.因此，我希望有一个更有效的解决方案.不过，我对为了速度而牺牲清晰度不感兴趣.

And because of the quantity of input (hundreds of megabytes of log files) I'm sometimes waiting for a minute or two for the script to finish. Hence my desire for a more efficient solution. I'm not interested in sacrificing clarity for speed, though.

我目前的正则表达式设置如下:

I currently have the regular expressions set up like this:

 if (($line =~ m{Failed in routing out}) ||
  ($line =~ m{Agent .+ failed}) ||
  ($line =~ m{Record Not Exist in DB}) ||
         ...

有没有更好的方法来构建它，使其更高效，但仍可维护?谢谢！

Is there a better way of structuring this so it's more efficient, yet still maintainable? Thanks!

推荐答案

你可能想看看 Regexp::Assemble.它的目的就是为了处理这类问题.

You might want to take a look at Regexp::Assemble. It's intended to handle exactly this sort of problem.

来自模块概要的提升代码:

Boosted code from the module's synopsis:

use Regexp::Assemble;

my $ra = Regexp::Assemble->new;
$ra->add( 'ab+c' );
$ra->add( 'ab+-' );
$ra->add( 'a\w\d+' );
$ra->add( 'a\d+' );
print $ra->re; # prints a(?:\w?\d+|b+[-c])

您甚至可以从单独的文件中提取正则表达式集合.

You can even slurp your regex collection out of a separate file.

这篇关于如何在 Perl 中有效地匹配许多不同的正则表达式模式?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Perl 中有效地匹配许多不同的正则表达式模式? [英] How can I efficiently match many different regex patterns in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Perl 中有效地匹配许多不同的正则表达式模式? [英] How can I efficiently match many different regex patterns in Perl?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭