正则表达式:两次比赛之间的否定超前 [英] Regex: negative look-ahead between two matches
问题描述
我正在尝试构建如下所示的正则表达式:
I'm trying to build a regex somewhat like this:
[match-word] ... [exclude-specific-word] ... [match-word]
这似乎与否定的前瞻配合使用,但是当我遇到这样的情况时,我遇到了一个问题:
This seems to work with a negative look-ahead, but I'm running into a problem when I have a case like this:
[match-word] ... [exclude-specific-word] ... [match-word] ... [excluded word appears again]
我希望上面的句子匹配,但是第一个和第二个匹配单词之间的否定超前溢出",因此第二个单词永远不会匹配.
I want the above sentence to match, but the negative look-ahead between the first and the second matched word "spills over" so the second word is never matched.
让我们看一个实际的例子.
Let's look at a practical example.
我不会匹配在两个单词之间包含单词"i"和单词"pie"但没有单词"hate"的每个句子. 我有这三个句子:
I wan't to match every sentence which has the word "i" and the word "pie", but not the word "hate" in between those two words. I have these three sentences:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
我有这个正则表达式:
^i(?!.*hate).*pie - have removed the word boundaries for clarity, original is: ^i\b(?!.*\bhate\b).*\bpie\b
哪个匹配第一个句子,但不匹配第二个句子,因为否定的前瞻会扫描整个字符串.
Which matches the first sentence, but not the second one, because the negative look-ahead scans the whole string.
是否有一种方法可以限制负面的超前行为,以便在遇到仇恨"之前遇到派"就满意了?
Is there a way to limit the negative look-ahead, so that it's satisfied if it encounters "pie" before it encounters "hate"?
注意:在我的实现中,此正则表达式(它是由语法搜索引擎动态构建的)后面可能还有其他术语,例如:
Note: in my implementation, there may be other terms following this regex (it's built dynamically from a grammar search engine), for instance:
^i(?!.*hate).*pie.*donuts
我当前正在使用JRegex,但如有必要,可能会切换到JDK Regex
I'm currently using JRegex, but could probably switch to JDK Regex if necessary
更新:我在最初的问题中忘记提及一些东西:
Update: I forgot to mention something in my initial question:
否定结构"可能存在于句子中,即使否定"结构存在于此,我也想匹配该句子.
It's possible that the "negative construct" exists further in the sentence, and I do want to match the sentence if it's possible even if the "negative" construct exists further up.
为澄清起见,请看以下句子:
To clarify, look at these sentences:
i sure like eating pie, but i love donuts <- Want to match this
i sure like eating pie, but i hate donuts <- Want to match this
i sure hate eating pie, but i like donuts <- Don't want to match this
i sure like eating pie, but i like donuts and i hate making pie <- Do want to match this
rob的答案非常适合这种额外的限制,所以我接受那个.
rob's answer works perfectly for this extra constraint, so I'm accepting that one.
推荐答案
在起始词和终止词之间的每个字符处,都必须确保其与否定词或终止词不匹配.像这样(为了便于阅读,我在其中添加了一些空白):
At every character between your start and stop words, you have to make sure that it doesn't match your negative or stop word. Like this (where I've included a little white space for readability):
^i ( (?!hate|pie) . )* pie
这是一个用于测试事物的python程序.
Here's a python program to test things.
import re
test = [ ('i sure like eating pie, but i love donuts', True),
('i sure like eating pie, but i hate donuts', True),
('i sure hate eating pie, but i like donuts', False) ]
rx = re.compile(r"^i ((?!hate|pie).)* pie", re.X)
for t,v in test:
m = rx.match(t)
print t, "pass" if bool(m) == v else "fail"
这篇关于正则表达式:两次比赛之间的否定超前的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!