冗长的 perl 正则表达式 [英] Lengthy perl regex

查看:68
本文介绍了冗长的 perl 正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎是一个有点奇怪的问题,但无论如何都切中要害;

This may seem as somewhat odd question, but anyhow to the point;

我有一个字符串,我需要在多种组合中搜索许多可能出现的字符(因此字符类是不可能的),那么最有效的方法是什么?

I have a string that I need to search for many many possible character occurrences in several combinations (so character classes are out of question), so what would be the most efficent way to do this?

我想要么把它堆成一个正则表达式:

I was thinking either stack it into one regex:

if ($txt =~ /^(?:really |really |long | regex here)$/){}

或者使用几个较小"的比较,但我认为这不会很有效:

or using several 'smaller' comparisons, but I'd assume this won't be very efficent:

if ($txt =~ /^regex1$/ || $txt =~ /^regex2$/ || $txt =~ /^regex3$/) {}

或者在比较时嵌套几个.

or perhaps nest several if comparisons.

对于这个问题的任何额外建议和其他意见,我将不胜感激.谢谢

I will appreciate any extra suggestions and other input on this issue. Thanks

推荐答案

从 v5.9.2 开始,Perl 编译了一组 N 个替代方案,例如:

Ever since way back in v5.9.2, Perl compiles a set of N alternatives like:

/string1|string2|string3|string4|string5|.../

转换成一个 trie 数据结构,如果这是模式中的第一件事,甚至可以使用 Aho-Corasick 匹配来非常快速地找到起点.

into a trie data structure, and if that is the first thing in the pattern, even uses Aho–Corasick matching to find the start point very quickly.

这意味着您的 N 个替代方案的匹配现在将在 O(1) 时间内运行,而不是在 O(N) 时间内运行:

That means that your match of N alternatives will now run in O(1) time instead of in the O(N) time that this:

if (/string1/ || /string2/ || /string3/ || /string4/ || /string5/ || ...)

将运行.

因此您可以拥有 O(1) 或 O(N) 性能:您的选择.

So you can have O(1) or O(N) performance: your choice.

如果您使用 re "debug"-Mre-debug,Perl 将在您的模式中显示这些特里结构.

If you use re "debug" or -Mre-debug, Perl will show these trie structures in your patterns.

这篇关于冗长的 perl 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆