如何在 Perl 中有效地匹配许多不同的正则表达式模式? [英] How can I efficiently match many different regex patterns in Perl?

查看:31
本文介绍了如何在 Perl 中有效地匹配许多不同的正则表达式模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不断增长的正则表达式列表,我用它们来解析日志文件,搜索有趣的"错误和调试语句.我目前将它们分成 5 个桶,其中大部分分为 3 个大桶.到目前为止,我有 140 多种模式,而且这个列表还在不断增加.

I have a growing list of regular expressions that I am using to parse through log files searching for "interesting" error and debug statements. I'm currently breaking them into 5 buckets, with most of them falling into 3 large buckets. I have over 140 of patterns so far, and the list is continuing to grow.

大多数正则表达式都很简单,但它们也相当独特,所以我用单一模式捕获多个匹配项的机会很少.由于我匹配的内容的性质,模式往往是模糊的,因此很少匹配,所以我在每个输入行上做了大量的工作,最终结果是它无法匹配任何东西,或匹配最后的通用之一.

Most of the regular expressions are simple, but they're also fairly unique, so my opportunities to catch multiple matches with a single pattern are few and far between. Because of the nature of what I'm matching, the patterns tend to be obscure and therefor seldom matched against, so I'm doing a TON of work on each input line with the end result being that it fails to match anything, or matches one of the generic ones at the very end.

而且由于输入的数量(数百兆字节的日志文件),我有时需要等待一两分钟才能完成脚本.因此,我希望有一个更有效的解决方案.不过,我对为了速度而牺牲清晰度不感兴趣.

And because of the quantity of input (hundreds of megabytes of log files) I'm sometimes waiting for a minute or two for the script to finish. Hence my desire for a more efficient solution. I'm not interested in sacrificing clarity for speed, though.

我目前的正则表达式设置如下:

I currently have the regular expressions set up like this:

 if (($line =~ m{Failed in routing out}) ||
  ($line =~ m{Agent .+ failed}) ||
  ($line =~ m{Record Not Exist in DB}) ||
         ...

有没有更好的方法来构建它,使其更高效,但仍可维护?谢谢!

Is there a better way of structuring this so it's more efficient, yet still maintainable? Thanks!

推荐答案

你可能想看看 Regexp::Assemble.它的目的就是为了处理这类问题.

You might want to take a look at Regexp::Assemble. It's intended to handle exactly this sort of problem.

来自模块概要的提升代码:

Boosted code from the module's synopsis:

use Regexp::Assemble;

my $ra = Regexp::Assemble->new;
$ra->add( 'ab+c' );
$ra->add( 'ab+-' );
$ra->add( 'a\w\d+' );
$ra->add( 'a\d+' );
print $ra->re; # prints a(?:\w?\d+|b+[-c])

您甚至可以从单独的文件中提取正则表达式集合.

You can even slurp your regex collection out of a separate file.

这篇关于如何在 Perl 中有效地匹配许多不同的正则表达式模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆