为什么字符类比交替更快? [英] Why is a character class faster than alternation?

查看:23
本文介绍了为什么字符类比交替更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个例子中,使用字符类似乎比交替更快:
[abc] vs (a|b|c)
我听说有人推荐它,并通过使用 Time::HiRes 的简单测试验证了它(慢了约 10 倍).
还使用 (?:a|b|c) 以防捕获括号产生差异不会改变结果.
但我不明白为什么.我认为这是因为回溯,但我在每个位置看到它的方式有 3 个字符比较,所以我不确定回溯如何影响交替.这是实现的交替性质的结果吗?

It seems that using a character class is faster than the alternation in an example like:
[abc] vs (a|b|c)
I have heard about it being recommended and with a simple test using Time::HiRes I verified it (~10 times slower).
Also using (?:a|b|c) in case the capturing parenthesis makes a difference does not change the result.
But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking hits in affecting the alternation. Is it a result of the implementation's nature of alternation?

推荐答案

这是因为交替之间的OR"构造 | 回溯:如果第一个交替是不匹配,引擎必须在轮次匹配期间指针位置移动之前返回,以继续匹配下一个轮次;而字符类可以顺序前进.在禁用优化的正则表达式引擎上查看此匹配:

This is because the "OR" construct | backtracks between the alternation: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:

Pattern: (r|f)at
Match string: carat

Pattern: [rf]at
Match string: carat

但简而言之, 引擎优化这个(单个文字字符 -> 字符类)已经是一个不错的提示,表明交替是低效的.

But to be short, the fact that pcre engine optimizes this (single literal characters -> character class) away is already a decent hint that alternations are inefficient.

这篇关于为什么字符类比交替更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆