与使用量化替代方案的较短正则表达式相比,展开循环有何优势? [英] What is the advantage of unroll the loop compared to shorter regex notation with quantified alternatives?

查看:31
本文介绍了与使用量化替代方案的较短正则表达式相比,展开循环有何优势?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要求:两个表达式, exp1 exp2 ,我们需要匹配一个或多个.所以我想出了,

 (exp1 | exp2)* 

但是在某些地方,我看到以下内容正在使用,

 (exp1 *(exp2 exp1 *)*) 

两者之间有什么区别?您什么时候可以使用另一个?

希望

注意,regex101.com上的步骤数并不直接表示一个正则表达式比另一个正则表达式更有效,但是,调试表显示了发生回溯的位置,而回溯的位置资源消耗.

然后让我们使用JS Benchmark.js测试模式效率:

  var suite = new Benchmark.Suite();Benchmark = window.Benchmark;套房.add('Regular RegExp test',function(){'此处有一些文字"'.match(/([[^" \\] | \\.)*"/);}).add('Unreged RegExp test',function(){'此处有一些文字"'.match(/"[^"\\]*(\\.[^"\\]*)*"/);}).on('cycle',function(event){console.log(String(event.target));}).on('complete',function(){console.log('最快是'+ this.filter('最快').map('名称'));}).run({'async':true});  

 < script src ="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.13.1/lodash.js></script>< script src ="https://cdnjs.cloudflare.com/ajax/libs/platform/1.3.1/platform.js"</script>< script src ="https://cdnjs.cloudflare.com/ajax/libs/benchmark/2.1.0/benchmark.js"></script>  

结果:

 常规RegExp测试x 9,295,393 ops/sec±0.69%(采样了64个运行)RegExp展开测试x 12,176,227 ops/sec±1.17%(采样了64个运行)最快的展开RegExp测试 

此外,由于展开循环概念不是特定于语言的,因此这是一个在线PHP测试(正常模式产生〜0.45 ,展开一个产生〜0.22 结果).

另请参见 展开循环,何时使用.

Requirement : Two expressions, exp1 and exp2, we need to match one or more of both. So I came up with,

(exp1 | exp2)*

However in some places, I see the below being used,

(exp1 * (exp2 exp1*)*)

What is the difference between the two? When would you use one over the other?

Hopefully a fiddle will make this more clear,

var regex1 = /^"([\x00-!#-[\]-\x7f]|\\")*"$/;
var regex2 = /^"([\x00-!#-[\]-\x7f]*(\\"[\x00-!#-[\]-\x7f]*)*)"$/;

var str = '"foo \\"bar\\" baz"';
var r1 = regex1.exec(str);
var r2 = regex2.exec(str);

EDIT: It looks like there is a difference in behavior between the two apporaches when we capture the groups. The second approach captures the entire string while the first approach captures only the last matching group. See updated fiddle.

解决方案

The difference between the two patterns is potential efficiency.

The (exp1 | exp2)* pattern contains an alternation that automatically disables some internal regex matching optimization. Also, this regex tries to match the pattern at each location in the string.

The (exp1 * (exp2 exp1*)*) expression is written acc. to the unroll-the-loop principle:

This optimisation thechnique is used to optimize repeated alternation of the form (expr1|expr2|...)*. These expression are not uncommon, and the use of another repetition inside an alternation may also leads to super-linear match. Super-linear match arise from the underterministic expression (a*)*.

The unrolling the loop technique is based on the hypothesis that in most case, you kown in a repeteated alternation, which case should be the most usual and which one is exceptional. We will called the first one, the normal case and the second one, the special case. The general syntax of the unrolling the loop technique could then be written as:

normal* ( special normal* )*

So, the exp1 in your example is normal part that is most common, and exp2 is expected to be less frequent. In that case, the efficiency of the unrolled pattern can be really, much higher than that of the other regex since the normal* part will grab the whole chunks of input without any need to stop and check each location.

Let's see a simple "([^"\\]|\\.)*" regex test against "some text here": there are 35 steps involved:

Unrolling it as "[^"\\]*(\\.[^"\\]*)*" gives a boost to 6 steps as there is much less backtracking.

NOTE that the number of steps at regex101.com does not directly mean one regex is more efficient than another, however, the debug table shows where backtracking occurs, and backtracking is resource consuming.

Let's then test the pattern efficiency with JS benchmark.js:

var suite = new Benchmark.Suite();
Benchmark = window.Benchmark;
suite
  .add('Regular RegExp test', function() {
      '"some text here"'.match(/"([^"\\]|\\.)*"/);
    })
  .add('Unrolled RegExp test', function() {
      '"some text here"'.match(/"[^"\\]*(\\.[^"\\]*)*"/);
    })
  .on('cycle', function(event) {
    console.log(String(event.target));
  })
  .on('complete', function() {
    console.log('Fastest is ' + this.filter('fastest').map('name'));
  })
  .run({ 'async': true });

<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.13.1/lodash.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/platform/1.3.1/platform.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/benchmark/2.1.0/benchmark.js"></script>

Results:

Regular RegExp test x 9,295,393 ops/sec ±0.69% (64 runs sampled)
Unrolled RegExp test x 12,176,227 ops/sec ±1.17% (64 runs sampled)
Fastest is Unrolled RegExp test

Also, since unroll the loop concept is not language specific, here is an online PHP test (regular pattern yielding ~0.45, and unrolled one yielding ~0.22 results).

Also see Unroll Loop, when to use.

这篇关于与使用量化替代方案的较短正则表达式相比,展开循环有何优势?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆