仅当文本包含白名单中的所有单词,但不包含黑名单中的所有单词时才匹配文本 [英] Match text only if it contains all words from whitelist, but none from blacklist
本文介绍了仅当文本包含白名单中的所有单词,但不包含黑名单中的所有单词时才匹配文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想通过这个例子更容易理解我想要实现的目标:
假设我们有这个白名单:一二三
.还有这个黑名单:四五
.然后:
三一二
为匹配文本(包含所有白名单词);一三二六
为匹配文本(包含所有白名单词);two one
不是匹配的文本(缺少白名单词three
);一四二三
不是匹配的文本(包含黑名单词four
).
有人能帮我解决这个案例的正则表达式吗?
解决方案
这不是您想要使用正则表达式的东西.最好这样做(Python 中的示例):
<预><代码>>>>白名单 = [一"、二"、三"]>>>blacklist = [四",五"]>>>texts = ["三二一", "一三二六", "一二", "一二三四"]>>>对于文本中的文本:... mytext = text.split()... if all(word in mytext for word in whitelist) and \...不是任何(我文本中的单词黑名单中的单词):... 打印(文本)...三二一一三二六>>>你可以做到这一点:
^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\b五\b)
^
将搜索锚定在字符串的开头.(?=...)
保证它的内容可以从当前位置开始匹配(?!...)
确保其内容无法从当前位置匹配\bone\b
匹配one
但不匹配lonely
.
所以你得到:
<预><代码>>>>进口重新>>>r = re.compile(r"^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\b5\b)")>>>对于文本中的文本:...如果 r.match(text):... 打印(文本)...三二一一三二六I guess it'll be easier to understand what I'm trying to achieve with this example:
Let's say we have this whitelist: one two three
. And this blacklist: four five
. Then:
three one two
is a matching text (contains all whitelist words);one three two six
is a matching text (contains all whitelist words);two one
is not a matching text (lacks a whitelist wordthree
);one four two three
is not a matching text (contains a blacklist wordfour
).
Could anyone help me out with a regex for this case?
解决方案
This is not something you'd want to use a regex for. Better do it like this (example in Python):
>>> whitelist = ["one", "two", "three"]
>>> blacklist = ["four", "five"]
>>> texts = ["three two one", "one three two six", "one two", "one two three four"]
>>> for text in texts:
... mytext = text.split()
... if all(word in mytext for word in whitelist) and \
... not any(word in mytext for word in blacklist):
... print(text)
...
three two one
one three two six
>>>
You can do it, though:
^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\bfive\b)
^
anchors the search at the start of the string.(?=...)
ensures that its contents can be matched from the current position(?!...)
ensures that its contents can't be matched from the current position\bone\b
matchesone
but notlonely
.
So you get:
>>> import re
>>> r = re.compile(r"^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\bfive\b)")
>>> for text in texts:
... if r.match(text):
... print(text)
...
three two one
one three two six
这篇关于仅当文本包含白名单中的所有单词,但不包含黑名单中的所有单词时才匹配文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文