展开循环,何时使用 [英] Unroll Loop, when to use

查看:59
本文介绍了展开循环,何时使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解正则表达式中的展开循环.之间的最大区别是什么?

I'm trying to understand unroll loops in regex. What is the big difference between:

MINISTÉRIO[\s\S]*?PÁG

MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG

在这种情况下:

http://regexr.com/3dmlr

如果第一次使用相同的功能,为什么我应该使用第二个?

Why should i use the second, if the first do the SAME thing?

谢谢.

推荐答案

什么是展开循环

请参见展开循环技术来源:

此优化方法用于优化形式(expr1|expr2|...)*的重复替换.这些表达式并不罕见,并且在交替中使用另一个重复也可能导致超线性匹配.超线性匹配源自不确定性表达式(a*)*.

This optimisation thechnique is used to optimize repeated alternation of the form (expr1|expr2|...)*. These expression are not uncommon, and the use of another repetition inside an alternation may also leads to super-linear match. Super-linear match arise from the underterministic expression (a*)*.

展开循环技术是基于这样的假设:在大多数情况下,您会重复进行交替,哪种情况应该是最常见的,哪种情况是例外的.我们将第一个称为正常情况,将第二个称为特殊情况.展开循环技术的一般语法可以写成:

The unrolling the loop technique is based on the hypothesis that in most case, you kown in a repeteated alternation, which case should be the most usual and which one is exceptional. We will called the first one, the normal case and the second one, the special case. The general syntax of the unrolling the loop technique could then be written as:

正常*(特殊正常*)*

因此,这是一种优化技术,其中,交替将变成线性匹配的原子.

So, this is an optimization technique where alternations are turned into linearly matching atoms.

这使得这些展开的模式非常有效,因为它们涉及的回溯较少.

This makes these unrolled patterns very efficient since they involve less backtracking.

您的 MINISTÉRIO[\s\S]*?PÁG 是非展开模式,而

Your MINISTÉRIO[\s\S]*?PÁG is a non-unrolled pattern while MINISTÉRIO[^P]*(?:P(?!ÁG)[^P]*)*PÁG is. See the demos (both saved with PCRE option to show the number of steps in the box above. Regex performance is different across regex engines, but this will tell you exactly the performance difference). Add more text after text: the first regex will start requiring more steps to finish, the second one will only show more steps after adding P. So, in texts where the character you used in the known part is not common, unrolled patterns are very efficient.

请参见 我的答案中.*?.*[^"]*+量词 部分之间的区别,以了解延迟匹配的工作原理(您的[\s\S]*?.*?相同, DOTALL修饰符,其语言也允许.匹配换行符.

See the Difference between .*?, .* and [^"]*+ quantifiers section in my answer to understand how lazy matching works (your [\s\S]*? is the same as .*? with a DOTALL modifier in languages that allow a . to match a newline, too).

惰性匹配模式是否总是缓慢且效率低下?并非总是如此.对于非常短的字符串,惰性点匹配通常更好(1-10个符号).当我们谈论长输入时,可能有前导定界符而没有尾随定界符,这可能会导致过度回溯,从而导致超时问题.

Is the lazy matching pattern always slow and inefficient? It is not always so. With very short strings, lazy dot matching is usually better (1-10 symbols). When we talk about long inputs, where there can be the leading delimiter, and no trailing one, this may lead to excessive backtracking leading to time out issues.

当您有可能很长的任意输入且可能不匹配时,请使用展开模式.

在控制输入时使用惰性匹配,您将知道总是存在匹配项,某些已知的设置日志格式等.

  1. 精炼贪婪令牌

常规字符串文字("String\u0020:\"text\""): "[^"\\]*(?:\\.[^"\\]*)*"

Regular string literals ("String\u0020:\"text\""): "[^"\\]*(?:\\.[^"\\]*)*"

多行注释正则表达式(/* Comments */): /\*[^*]*\*+(?:[^/*][^*]*\*+)*/

Multiline comment regex (/* Comments */): /\*[^*]*\*+(?:[^/*][^*]*\*+)*/

@<...>@注释正则表达式: @<[^>]*(?:>[^@]*)*@

@<...>@ comment regex: @<[^>]*(?:>[^@]*)*@

这篇关于展开循环,何时使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆