使用非贪婪限定符或前瞻更好吗? [英] Is it better to use a non-greedy qualifier or a lookahead?

查看:48
本文介绍了使用非贪婪限定符或前瞻更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可能很大的文本块来搜索 [[...]] 的实例,其中 ... 可以是任何东西,包括其他括号(尽管它们不能嵌套;[[ 结束匹配后的 ]] 的第一个实例).

我能想到两种方法来匹配这段文字:

  • 使用非贪婪限定符:/\[\[.+?\]\]/
  • 使用前瞻:/\[\[(?:(?!\]\]).)+\]\]/

从性能的角度来看,一个选择是否本质上比另一个更好(我认为第一个可能更具可读性)?我记得读过最好不要使用非贪婪的限定符,但我现在找不到它的来源.

解决方案

在这种情况下最好使用非贪婪量词.

以这个例子字符串"[[a]b]]"

非贪婪量词

<前>\[\[.+?\]\]原子 #1 2 3 4 5

  1. Atom #1 \[ 匹配
  2. Atom #2 \[ 匹配
  3. Atom #3 .+? 匹配 "a"
  4. Atom #4 \] 匹配
  5. Atom #5 \] 失败,回到 #3 但保持字符串位置
  6. Atom #3 .+? 匹配 "]"
  7. Atom #4 \] 失败,回到 #3 但保持字符串位置
  8. Atom #3 .+? 匹配 "b"
  9. Atom #4 \] 匹配
  10. Atom #5 \] 匹配
  11. 成功

前瞻:

<前>\[\[(?:(?!\]\]).)+\]\]原子 # 1 2 3 4 5 6 7

  1. Atom #1 \[ 匹配
  2. Atom #2 \[ 匹配
  3. Atom #4 (?!\]\]) 立即在 "a" 处成功(即不匹配),继续
  4. Atom #5 . 匹配 "a",在 #3 处重复
  5. Atom #4 (?!\]\])"]"
  6. 处实现部分匹配
  7. Atom #4 (?!\]\])"b" 处成功(即不匹配),继续
  8. Atom #5 . 匹配 "]",在 #3 处重复
  9. Atom #4 (?!\]\]) 立即在 "b" 处成功(即不匹配),继续
  10. Atom #5 . 匹配 "b",在 #3 处重复
  11. Atom #4 (?!\]\])"]"
  12. 处实现部分匹配
  13. Atom #4 (?!\]\])"]" 处实现完全匹配,因此:#4 失败,退出 #3
  14. Atom #6 \] 匹配
  15. Atom #7 \] 匹配
  16. 成功

所以看起来非贪婪量词要做的工作更少.

免责声明:这是一个人工示例,实际性能可能会有所不同,具体取决于输入、实际表达式和正则表达式引擎的实现.我只有 98% 的把握确定我在这里概述的是实际发生的情况,因此我愿意进行更正.此外,与所有性能提示一样,不要只看表面价值,如果您想确定,请进行自己的基准比较.

I have a possibly large block of text to search for instances of [[...]], where the ... can be anything, including other brackets (though they cannot be nested; the first instance of ]] after [[ ends the match).

I can think of two ways to match this text:

  • Using a non-greedy qualifier: /\[\[.+?\]\]/
  • Using a lookahead: /\[\[(?:(?!\]\]).)+\]\]/

Is one choice inherently better than the other, from a performance standpoint (I'd say the first is probably more readable)? I recall reading that it's better not to use non-greedy qualifiers, but I cannot find a source for that now.

解决方案

It is better to use a non-greedy quantifier in this case.

Take this example string "[[a]b]]"

Non-greedy quantifier

       \[\[.+?\]\]
Atom # 1 2 3  4 5

  1. Atom #1 \[ matches
  2. Atom #2 \[ matches
  3. Atom #3 .+? matches the "a"
  4. Atom #4 \] matches
  5. Atom #5 \] fails, back to #3 but keep string position
  6. Atom #3 .+? matches the "]"
  7. Atom #4 \] fails, back to #3 but keep string position
  8. Atom #3 .+? matches the "b"
  9. Atom #4 \] matches
  10. Atom #5 \] matches
  11. success

Look-ahead:

       \[\[(?:(?!\]\]).)+\]\]
Atom # 1 2 3  4       5  6 7

  1. Atom #1 \[ matches
  2. Atom #2 \[ matches
  3. Atom #4 (?!\]\]) succeeds (i.e. non-match) immediately at "a", go on
  4. Atom #5 . matches the "a", repeat at #3
  5. Atom #4 (?!\]\]) achieves partial match at "]"
  6. Atom #4 (?!\]\]) succeeds (i.e. non-match) at "b", go on
  7. Atom #5 . matches the "]", repeat at #3
  8. Atom #4 (?!\]\]) succeeds (i.e. non-match) immediately at "b", go on
  9. Atom #5 . matches the "b", repeat at #3
  10. Atom #4 (?!\]\]) achieves partial match at "]"
  11. Atom #4 (?!\]\]) achieves full match at "]", ergo: #4 fails, exit #3
  12. Atom #6 \] matches
  13. Atom #7 \] matches
  14. success

So it looks like the non-greedy quantifier has less work to do.

Disclaimer: This is an artificial example and real-life performance may differ, depending on the input, the actual expression and the implementation of the regex engine. I'm only 98% sure that what I outlined here is what is actually happening, so I'm open for corrections. Also, as with all performance tips, don't take this at face value, do your own benchmark comparisons if you want to know for sure.

这篇关于使用非贪婪限定符或前瞻更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆