正则表达式负面环顾两个相邻的比赛 [英] regex negative look around with 2 adjacent matches

查看:57
本文介绍了正则表达式负面环顾两个相邻的比赛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自那里的人应该是一个简单的问题:

Should be an easy question from someone out there:

如果我运行此JavaScript:

var regex = new RegExp("(?!cat)dog(?!cat)","g");
var text =  "catdogcat catdogdog catdogdogcat".replace(regex,"000");
console.log(text);

输出此信息:

catdogcat cat000000 cat000dogcat

但我尽管它应输出:

catdogcat cat000000 cat000000cat

为什么 catdogdogcat 中的第二只狗不是替换为 000

Why isn't the second "dog" in catdogdogcat replaced with 000?

编辑:我想在两只猫都没有猫时替换狗。在 catdogdogcat 中,两只狗都满足这个要求,所以应该更换它们。显然我不明白这些消极的看法......

I want to replace "dog" whenever it doesn't have cat on BOTH sides. In catdogdogcat, BOTH dogs fulfill this requirement and so they should be replaced. Obviously I don't Understand these negative look arounds...

推荐答案

你的方法有两个问题。


  1. 你的第一个前瞻需要成为一个后卫。当你写(?!cat),引擎会检查接下来的三个字符 cat 然后重置到它开始的位置(这就是看起来前面的方式) ,然后你尝试匹配 dog 同样的三个字符。因此,前瞻不会添加任何内容:如果你可以匹配 dog 你显然无法匹配 cat 同样的立场。你想要的是一个后视(?<!cat),它检查前面的字符是不是 cat 。遗憾的是,JavaScript不支持lookbehind。

  2. 您希望逻辑上这两个外观。在您的情况下,如果有任何一种看法失败,导致模式失败。因此,需要满足(在任一端具有 cat )的两个要求。但你实际上想要 OR 。如果支持lookbehinds,看起来更像是(?<!cat)dog | dog(?!cat)(请注意,交替将整个模式分开)。但正如我所说,不支持lookbehinds。您似乎在第一个 catdogdog 位中有* OR * ed两个外观的原因是前面的 cat 根本没有检查(见第1点)。

  1. Your first lookahead needs to be a lookbehind. When you write (?!cat), the engine checks that the next three characters are cat and then resets to where it started (that's how it looks ahead), and then you try to match dog at those same three characters. Therefore, the lookahead doesn't add anything: if you can match dog you obviously can't match cat at the same position. What you want is a lookbehind (?<!cat) that checks that the preceding characters are not cat. Unfortunately, JavaScript doesn't support lookbehind.
  2. You want to logically OR the two lookarounds. In your case, if either lookaround fails, it causes the pattern fail. Hence both requirements (of not having cat at either end) need to be fulfilled. But you actually want to OR that. If lookbehinds were supported that would rather look like (?<!cat)dog|dog(?!cat) (note that the alternation splits the entire pattern apart). But as I said, lookbehinds are not supported. The reason why you seemd to have *OR*ed the two lookarounds in your first catdogdog bit is that the preceding cat was simply not checked (see point 1).

如何解决后视镜问题呢? Kolink的回答建议(?!cat)... dog ,它将环视放在 cat 的位置开始,并使用前瞻。这有两个新问题:它不能匹配字符串开头的 dog (因为前面的三个字符是必需的。它不能匹配两个连续的 dog s因为匹配不能重叠(匹配第一个后,引擎需要三个新字符 .. 。,在再次实际匹配 dog 之前,会消耗下一个

How to work around lookbehinds then? Kolink's answer suggests (?!cat)...dog, which puts the lookaround at the position where a cat would start, and uses a lookahead. This has two new problems: it cannot match a dog at the beginning of the string (because the three characters in front are required. And it cannot match two consecutive dogs because matches cannot overlap (after matching the first dog, the engine requires three new characters which ..., which would consume the next dog before actually matching dog again).

有时候你可以通过反转模式和字符串来解决它,从而将后视转变为前瞻 - 但在你的情况下,这将把最后的前瞻变成一个后视。

Sometimes you can work around it by reverse both pattern and string, hence turning the lookbehind into a lookahead - but in your case that would turn the lookahead at the end into a lookbehind.

我们必须更聪明一点。由于匹配不能重叠,我们可以尝试显式匹配 catdogcat ,而不替换它(因此在目标字符串中跳过它们),然后只需替换所有 dog s我们发现。我们把两个案例交替进行,所以他们是机器人h尝试在字符串中的每个位置(使用 catdogcat 选项优先,尽管这里并不重要)。问题是如何获得条件替换字符串。但是让我们看看到目前为止我们得到了什么:

We have to be a bit cleverer. Since matches cannot overlap, we could try to match catdogcat explicitly, without replacing it (hence skipping them in the target string), and then just replace all dogs we find. We put the two cases in an alternation, so they are both tried at every position in the string (with the catdogcat option taking precedence, although it doesn't really matter here). The problem is how to get conditional replacement strings. But let's look at what we've got so far:

text.replace(/(catdog)(?=cat)|dog/g, "$1[or 000 if $1 didn't match]")

所以在第一个替代方案我们匹配 catdog 并将其捕获到组 1 并检查是否还有另一个以下。在替换字符串中,我们只需写回 $ 1 。美丽的是,如果第二种选择匹配,第一组将是未使用的,因此是一个空的字符串替换。我们只匹配 catdog 并使用前瞻而不是匹配 catdogcat 的原因再次重叠匹配。如果我们使用 catdogcat ,那么在输入 catdogcatdogcat 中,第一个匹配将消耗所有内容,直到并包括第二个 cat ,因此第一个替代品无法识别第二个

So in the first alternative we match a catdog and capture it into group 1 and check that there is another cat following. In the replacement string we simply write the $1 back. The beauty is, if the second alternative matched, the first group will be unused and hence be an empty string the replacement. The reason why we only match catdog and use a lookahead instead of matching catdogcat right away is again overlapping matches. If we used catdogcat, then in the input catdogcatdogcat the first match would consume everything until and including the second cat, hence the second dog could not be recognized by the first alternative.

现在唯一的问题是,如果我们使用第二种选择,我们如何在替换中获得 000

Now the only question is, how do we get a 000 into the replacement, if we used the second alternative.

不幸的是,我们无法想象不属于输入字符串的条件替换。诀窍是在输入字符串的末尾添加一个 000 ,如果我们找到一个狗,然后写回:

Unfortunately, we can't conjure up conditional replacements that are not part of the input string. The trick is to add a 000 to the end of the input string, capture that in a lookahead if we find a dog, and then write that back:

text.replace(/$/, "000")                            
    .replace(/(catdog)(?=cat)|dog(?=.*(000))/g, "$1$2")
    .replace(/000$/, "")

第一个替换将 000 添加到字符串的末尾。

The first replacement adds 000 to the end of the string.

第二个替换匹配 catdog (检查另一个 cat 跟随)并将其捕获到组 1 (保留 2 为空)或匹配 dog 并将 000 捕获到组 2 中(离开组 1 空)。然后我们回写 $ 1 $ 2 ,这将是未加修饰的 catdog 000

The second replacement matches either catdog (checking that another cat follows) and captures it into group 1 (leaving 2 empty) or matches dog and captures 000 into group 2 (leaving group 1 empty). Then we write $1$2 back, which will be either the unadorned catdog or 000.

第三个替换品在字符串末尾摆脱了我们无关的 000

The third replacement gets rid of our extraneous 000 at the end of the string.

如果您不是准备正则表达式的粉丝,那么第二个就是前瞻选项,您可以使用稍微简单的正则表达式和替换回调:

If you are not a fan of preparing the regex, and the lookahead in the second option, you can instead use a slightly simpler regex with a replacement callback:

text.replace(/(catdog)(?=cat)|dog/g, function(match, firstGroup) {
    return firstGroup ? firstGroup : "000"
})

使用的版本替换,为每个匹配调用所提供的函数,并将其返回值用作替换字符串。函数first参数是整个匹配,第二个参数是第一个捕获组(如果组没有参与匹配,将是 undefined ),依此类推。 ..

With the version of replace the supplied function gets called for each match and its return value is used as the replacement string. The functions first parameter is the entire match, the second parameter is the first capturing group (which will be undefined if the group doesn't participate in the match) and so on...

所以在替换回调中,如果,我们可以自由地召唤我们的 000 firstGroup 未定义(即 dog 选项匹配)或只返回 firstGroup 如果它存在(即匹配的 catdogcat 选项)。这有点简洁,可能更容易理解。但是,调用该函数的开销会使明显变慢(尽管这是否重要取决于您想要执行此操作的频率)。选择你最喜欢的!

So in the replacement callback we are free to conjure up our 000 if firstGroup is undefined (i.e. the dog option matched) or just return the firstGroup if it is present (i.e. the catdogcat option matched). This is a bit more concise and possibly easier to understand. However, the overhead of calling the function makes it significantly slower (although whether that matters depends on how often you want to do this). Pick your favorite!

这篇关于正则表达式负面环顾两个相邻的比赛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆