正则表达式:两次比赛之间的否定超前 [英] Regex: negative look-ahead between two matches

查看:118
本文介绍了正则表达式:两次比赛之间的否定超前的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建如下所示的正则表达式:

I'm trying to build a regex somewhat like this:

[match-word] ... [exclude-specific-word] ... [match-word]

这似乎与否定的前瞻配合使用,但是当我遇到这样的情况时,我遇到了一个问题:

This seems to work with a negative look-ahead, but I'm running into a problem when I have a case like this:

[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]

我希望上面的句子匹配,但是第一个和第二个匹配单词之间的否定超前溢出",因此第二个单词永远不会匹配.

I want the above sentence to match, but the negative look-ahead between the first and the second matched word "spills over" so the second word is never matched.

让我们看一个实际的例子.

Let's look at a practical example.

我不会匹配在两个单词之间包含单词"i"和单词"pie"但没有单词"hate"的每个句子. 我有这三个句子:

I wan't to match every sentence which has the word "i" and the word "pie", but not the word "hate" in between those two words. I have these three sentences:

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this

我有这个正则表达式:

^i(?!.*hate).*pie          - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b 

哪个匹配第一个句子,但不匹配第二个句子,因为否定的前瞻会扫描整个字符串.

Which matches the first sentence, but not the second one, because the negative look-ahead scans the whole string.

是否有一种方法可以限制负面的超前行为,以便在遇到仇恨"之前遇到派"就满意了?

Is there a way to limit the negative look-ahead, so that it's satisfied if it encounters "pie" before it encounters "hate"?

注意:在我的实现中,此正则表达式(它是由语法搜索引擎动态构建的)后面可能还有其他术语,例如:

Note: in my implementation, there may be other terms following this regex (it's built dynamically from a grammar search engine), for instance:

^i(?!.*hate).*pie.*donuts

我当前正在使用JRegex,但如有必要,可能会切换到JDK Regex

I'm currently using JRegex, but could probably switch to JDK Regex if necessary

更新:我在最初的问题中忘记提及一些东西:

Update: I forgot to mention something in my initial question:

否定结构"可能存在于句子中,即使否定"结构存在于此,我也想匹配该句子.

It's possible that the "negative construct" exists further in the sentence, and I do want to match the sentence if it's possible even if the "negative" construct exists further up.

为澄清起见,请看以下句子:

To clarify, look at these sentences:

i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this

rob的答案非常适合这种额外的限制,所以我接受那个.

rob's answer works perfectly for this extra constraint, so I'm accepting that one.

推荐答案

在起始词和终止词之间的每个字符处,都必须确保其与否定词或终止词不匹配.像这样(为了便于阅读,我在其中添加了一些空白):

At every character between your start and stop words, you have to make sure that it doesn't match your negative or stop word. Like this (where I've included a little white space for readability):

^i ( (?!hate|pie) . )* pie

这是一个用于测试事物的python程序.

Here's a python program to test things.

import re

test = [ ('i sure like eating pie, but i love donuts', True),
         ('i sure like eating pie, but i hate donuts', True),
         ('i sure hate eating pie, but i like donuts', False) ]

rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)

for t,v in test:
    m = rx.match(t)
    print t, "pass" if bool(m) == v else "fail"

这篇关于正则表达式:两次比赛之间的否定超前的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆