在 Ruby 的正则表达式中,前瞻和后视概念如何支持这种零宽度断言概念? [英] How the Look-ahead and Look-behind concept supports such Zero-Width Assertions concept in Regex of Ruby?

查看:39
本文介绍了在 Ruby 的正则表达式中,前瞻和后视概念如何支持这种零宽度断言概念?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚从文档中了解了 Zero-Width Assertions 的概念.我想到了一些简单的问题-

I just gone through the concept Zero-Width Assertions from the documentation. And some quick questions comes into my mind-

  • 为什么这样命名Zero-Width Assertions?
  • Look-aheadlook-behind 概念如何支持这样的零宽度断言概念?
  • 什么?<=s,,=s,<;=s - 4 个符号在模式内指示?你能帮我集中精力了解实际发生的事情吗
  • why such name Zero-Width Assertions?
  • How the Look-ahead and look-behind concept supports such Zero-Width Assertions concept?
  • What such ?<=s,<!s,=s,<=s - 4 symbols are instructing inside the pattern? can you help me here to focus to understand what is actually going on

我也尝试了一些小代码来理解逻辑,但对它们的输出没有那么自信:

I also tried some tiny codes to understand the logic, but not that much confident with the output of those:

irb(main):001:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"
irb(main):002:0> "foresight".sub(/(?=s)ight/, 'ee')
=> "foresight"
irb(main):003:0> "foresight".sub(/(?<=s)ight/, 'ee')
=> "foresee"
irb(main):004:0> "foresight".sub(/(?<!s)ight/, 'ee')
=> "foresight"

这里有人能帮我理解吗?

Can anyone help me here to understand?

编辑

在这里,我尝试了两个带有零宽度断言"概念的片段,如下所示:

irb(main):002:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"

另一个没有零宽度断言"概念,如下所示:

irb(main):003:0> "foresight".sub(/ight/, 'ee')
=> "foresee"

以上两者都产生相同的输出,现在在内部 regexp 如何自行移动以产生输出 - 你能帮我想象一下吗?

Both the above produces same output,now internally how the both regexp move by their own to produce output- could you help me to visualize?

谢谢

推荐答案

正则表达式从左到右匹配,并沿字符串移动某种光标".如果您的正则表达式包含像 a 这样的常规字符,这意味着:如果光标前面有一个字母 a,请将光标向前移动一个字符,然后继续.否则,出问题了;备份并尝试其他方法."所以你可能会说 a 有一个字符的宽度".

Regular expressions match from left to right, and move a sort of "cursor" along the string as they go. If your regex contains a regular character like a, this means: "if there's a letter a in front of the cursor, move the cursor ahead one character, and keep going. Otherwise, something's wrong; back up and try something else." So you might say that a has a "width" of one character.

零宽度断言"就是这样:它断言关于字符串的某些内容(即,如果某些条件不成立则不匹配),但它不会移动光标向前,因为它的宽度"为零.

A "zero-width assertion" is just that: it asserts something about the string (i.e., doesn't match if some condition doesn't hold), but it doesn't move the cursor forwards, because its "width" is zero.

您可能已经熟悉一些更简单的零宽度断言,例如 ^$.这些匹配字符串的开头和结尾.如果光标在看到这些符号时不在开头或结尾,则正则表达式引擎将失败、备份并尝试其他操作.但它们实际上并没有向前移动光标,因为它们不匹配字符;他们只检查光标在哪里.

You're probably already familiar with some simpler zero-width assertions, like ^ and $. These match the start and end of a string. If the cursor isn't at the start or end when it sees those symbols, the regex engine will fail, back up, and try something else. But they don't actually move the cursor forwards, because they don't match characters; they only check where the cursor is.

前瞻和后视的工作方式相同.当正则表达式引擎尝试匹配它们时,它会检查周围光标以查看正确的模式是在它的前面还是后面,但如果匹配,它不会移动光标.

Lookahead and lookbehind work the same way. When the regex engine tries to match them, it checks around the cursor to see if the right pattern is ahead of or behind it, but in case of a match, it doesn't move the cursor.

考虑:

/(?=foo)foo/.match 'foo'

这将匹配!正则表达式引擎是这样的:

This will match! The regex engine goes like this:

  1. 从字符串的开头开始:|foo.
  2. 正则表达式的第一部分是 (?=foo).这意味着:仅当 foo 出现在光标之后才匹配.可以?嗯,是的,所以我们可以继续.但是光标不会移动,因为这是零宽度.我们还有 |foo.
  3. 接下来是f.光标前是否有f?是的,所以继续,将光标移过 f:f|oo.
  4. 接下来是o.光标前是否有o?是的,所以继续,将光标移过 o:fo|o.
  5. 同样的事情,把我们带到 foo|.
  6. 我们到达了正则表达式的末尾,没有失败,所以模式匹配.
  1. Start at the beginning of the string: |foo.
  2. The first part of the regex is (?=foo). This means: only match if foo appears after the cursor. Does it? Well, yes, so we can proceed. But the cursor doesn't move, because this is zero-width. We still have |foo.
  3. Next is f. Is there an f in front of the cursor? Yes, so proceed, and move the cursor past the f: f|oo.
  4. Next is o. Is there an o in front of the cursor? Yes, so proceed, and move the cursor past the o: fo|o.
  5. Same thing again, bringing us to foo|.
  6. We reached the end of the regex, and nothing failed, so the pattern matches.

特别是关于你的四个断言:

On your four assertions in particular:

  • (?=...) 是前瞻";它断言 ... 确实 出现在光标之后.

  • (?=...) is "lookahead"; it asserts that ... does appear after the cursor.

1.9.3p125 :002 > 'jump june'.gsub(/ju(?=m)/, 'slu')
 => "slump june" 

jump"中的ju"匹配,因为接下来是m".但是june"中的ju"没有下一个m",所以它是单独存在的.

The "ju" in "jump" matches because an "m" comes next. But the "ju" in "june" doesn't have an "m" next, so it's left alone.

因为它不会移动光标,所以在它后面放置任何东西时必须小心.(?=a)b 永远不会匹配任何东西,因为它检查下一个字符是 a,然后 also 检查 相同的字符是b,这是不可能的.

Since it doesn't move the cursor, you have to be careful when putting anything after it. (?=a)b will never match anything, because it checks that the next character is a, then also checks that the same character is b, which is impossible.

(?<=...) 是后视";它断言 ... 确实 出现在 光标之前.

(?<=...) is "lookbehind"; it asserts that ... does appear before the cursor.

1.9.3p125 :002 > 'four flour'.gsub(/(?<=f)our/, 'ive')
 => "five flour" 

four"中的our"匹配,因为它前面有一个f",但flour"中的our"在它前面有一个l",所以不匹配.

The "our" in "four" matches because there's an "f" immediately before it, but the "our" in "flour" has an "l" immediately before it so it doesn't match.

如上所述,您必须小心放在之前的内容.a(?<=b) 永远不会匹配,因为它检查下一个字符是 a,移动光标,然后检查前一个字符是 b.

Like above, you have to be careful with what you put before it. a(?<=b) will never match, because it checks that the next character is a, moves the cursor, then checks that the previous character was b.

(?!...) 是负前瞻";它断言 ... 不会出现在光标之后.

(?!...) is "negative lookahead"; it asserts that ... does not appear after the cursor.

1.9.3p125 :003 > 'child children'.gsub(/child(?!ren)/, 'kid')
 => "kid children"

child"匹配,因为接下来是一个空格,而不是ren".儿童"没有.

"child" matches, because what comes next is a space, not "ren". "children" doesn't.

这可能是我最常用的一种;精细控制接下来不会发生的事情会派上用场.

This is probably the one I get the most use out of; finely controlling what can't come next comes in handy.

(? 是负向后视";它断言 ... 不会出现在光标之前.

(?<!...) is "negative lookbehind"; it asserts that ... does not appear before the cursor.

1.9.3p125 :004 > 'foot root'.gsub(/(?<!r)oot/, 'eet')
 => "feet root" 

foot"中的oot"没问题,因为它前面没有r".root"中的oot"显然有一个r".

The "oot" in "foot" is fine, since there's no "r" before it. The "oot" in "root" clearly has an "r".

作为一个额外的限制,在这种情况下,大多数正则表达式引擎要求 ... 具有固定长度.所以你不能使用 ?+*{n,m}.

As an additional restriction, most regex engines require that ... has a fixed length in this case. So you can't use ?, +, *, or {n,m}.

你也可以嵌套这些,否则做各种疯狂的事情.我主要将它们用于我知道永远不必维护的一次性使用,因此我没有任何实用的实际应用程序示例;老实说,它们很奇怪,您应该先尝试以其他方式做您想做的事.:)

You can also nest these and otherwise do all kinds of crazy things. I use them mainly for one-offs I know I'll never have to maintain, so I don't have any great examples of real-world applications handy; honestly, they're weird enough that you should try to do what you want some other way first. :)

事后思考:语法来自 Perl 正则表达式,它使用了 (? 后面跟了很多扩展语法的各种符号,因为 ? 本身是无效的.所以 <= 本身没有任何意义;(?<= 是一个完整的标记,意思是这是回顾的开始".这就像 +=++ 是独立的操作符,即使它们都以 + 开头.

Afterthought: The syntax comes from Perl regular expressions, which used (? followed by various symbols for a lot of extended syntax because ? on its own is invalid. So <= doesn't mean anything by itself; (?<= is one entire token, meaning "this is the start of a lookbehind". It's like how += and ++ are separate operators, even though they both start with +.

不过它们很​​容易记住:= 表示向前看(或者,实际上,这里"),< 表示向后看,! 具有不"的传统含义.

They're easy to remember, though: = indicates looking forwards (or, really, "here"), < indicates looking backwards, and ! has its traditional meaning of "not".

关于你后面的例子:

irb(main):002:0> "foresight".sub(/(?!s)ight/, 'ee')
=> "foresee"

irb(main):003:0> "foresight".sub(/ight/, 'ee')
=> "foresee"

是的,这些产生相同的输出.这是使用前瞻的棘手之处:

Yes, these produce the same output. This is that tricky bit with using lookahead:

  1. 正则表达式引擎已经尝试了一些东西,但它们没有奏效,现在它处于fores|ight.
  2. 它检查(?!s).光标s之后的字符是吗?不,是i!所以那部分匹配并且匹配继续,但是光标不动,我们仍然有fores|ight.
  3. 它检查ight.ight 是否在光标之后?嗯,是的,确实如此,所以移动光标:foresight|.
  4. 我们完成了!
  1. The regex engine has tried some things, but they haven't worked, and now it's at fores|ight.
  2. It checks (?!s). Is the character after the cursor s? No, it's i! So that part matches and the matching continues, but the cursor doesn't move, and we still have fores|ight.
  3. It checks ight. Does ight come after the cursor? Well, yes, it does, so move the cursor: foresight|.
  4. We're done!

光标移动到子字符串 ight 上,所以这是完全匹配,这就是被替换的内容.

The cursor moved over the substring ight, so that's the full match, and that's what gets replaced.

(?!a)b 没用,因为你说:下一个字符不能a,而且它必须b.但这和只匹配 b!

Doing (?!a)b is useless, since you're saying: the next character must not be a, and it must be b. But that's the same as just matching b!

这有时很有用,但您需要更复杂的模式:例如,(?!3)\d 将匹配任何不是 3 的数字.

This can be useful sometimes, but you need a more complex pattern: for example, (?!3)\d will match any digit that isn't a 3.

这就是你想要的:

1.9.3p125 :001 > "foresight".sub(/(?<!s)ight/, 'ee')
 => "foresight" 

这断言 s 不在 before ight.

这篇关于在 Ruby 的正则表达式中,前瞻和后视概念如何支持这种零宽度断言概念?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆