分隔符之间的匹配文本:贪婪还是懒惰的正则表达式? [英] Matching text between delimiters: greedy or lazy regular expression?
本文介绍了分隔符之间的匹配文本:贪婪还是懒惰的正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
对于在分隔符(例如<
和>
)之间匹配文本的常见问题,有两种常见的模式:
For the common problem of matching text between delimiters (e.g. <
and >
), there's two common patterns:
- 使用格式为
START [^END]* END
的贪婪的*
或+
量词,例如<[^>]*>
或 - 使用形式为
START .*? END
的惰性*?
或+?
量词,例如<.*?>
.
- using the greedy
*
or+
quantifier in the formSTART [^END]* END
, e.g.<[^>]*>
, or - using the lazy
*?
or+?
quantifier in the formSTART .*? END
, e.g.<.*?>
.
是否有一个特定的理由要一个人胜于另一个人?
Is there a particular reason to favour one over the other?
推荐答案
一些优点:
[^>]*
:
- 更具表现力.
- 捕获换行符而与
/s
标志无关. - 考虑更快,因为引擎不必回溯即可找到成功的匹配项(使用
[^>]
引擎不会做出选择-我们仅提供一种将模式与字符串进行匹配的方法).
- More expressive.
- Captures newlines regardless of
/s
flag. - Considered quicker, because the engine doesn't have to backtracks to find a successful match (with
[^>]
the engine doesn't make choices - we give it only one way to match the pattern against the string).
.*?
- 没有代码重复"-结束字符仅出现一次.
- 在结束定界符超过一个字符长的情况下更简单. (在这种情况下,字符类将不起作用)常见的替代方法是
(?:(?!END).)*
.如果END分隔符是另一种模式,则情况更糟.
- No "code duplication" - the end character only appears once.
- Simpler in cases the end delimiter is more than a character long. (a character class would not work in this case) A common alternative is
(?:(?!END).)*
. This is even worse if the END delimiter is another pattern.
这篇关于分隔符之间的匹配文本:贪婪还是懒惰的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文