为什么角色类比交替类要快? [英] Why is a character class faster than alternation?
问题描述
在类似这样的示例中,使用字符类似乎比替换更快:
[abc]
与(a|b|c)
我听说过有人建议使用它,并通过Time::HiRes
进行了简单测试,我对其进行了验证(速度要慢10倍左右).
如果捕获括号产生差异,也可以使用(?:a|b|c)
,而不会改变结果.
但是我不明白为什么.我认为这是因为回溯,但是我在每个位置看到它的方式都有3个字符的比较,所以我不确定回溯如何影响交替.是实现的交替性质的结果吗?
It seems that using a character class is faster than the alternation in an example like:
[abc]
vs (a|b|c)
I have heard about it being recommended and with a simple test using Time::HiRes
I verified it (~10 times slower).
Also using (?:a|b|c)
in case the capturing parenthesis makes a difference does not change the result.
But I can not understand why. I think it is because of backtracking but the way I see it at each position there are 3 character comparison so I am not sure how backtracking hits in affecting the alternation. Is it a result of the implementation's nature of alternation?
推荐答案
这是因为交替之间的或"构造|
backtracks :如果第一个交替不匹配,则引擎必须在交替匹配期间指针位置移动之前返回,以继续匹配下一个交替;而角色类可以顺序前进.在禁用优化的正则表达式引擎上查看此匹配项:
This is because the "OR" construct |
backtracks between the alternation: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:
Pattern: (r|f)at
Match string: carat
Pattern: [rf]at
Match string: carat
简而言之,pcre 引擎对此进行了优化(单个文字字符->字符类),这已经很好地暗示了替换效率低下.
But to be short, the fact that pcre engine optimizes this (single literal characters -> character class) away is already a decent hint that alternations are inefficient.
这篇关于为什么角色类比交替类要快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!