仅当文本包含白名单中的所有单词，但不包含黑名单中的所有单词时才匹配文本 [英] Match text only if it contains all words from whitelist, but none from blacklist

查看：47 发布时间：2021/7/6 20:20:38 regex

本文介绍了仅当文本包含白名单中的所有单词，但不包含黑名单中的所有单词时才匹配文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过这个例子更容易理解我想要实现的目标:

假设我们有这个白名单:一二三.还有这个黑名单:四五.然后:

三一二为匹配文本(包含所有白名单词)；
一三二六为匹配文本(包含所有白名单词)；
two one 不是匹配的文本(缺少白名单词three)；
一四二三不是匹配的文本(包含黑名单词four).

有人能帮我解决这个案例的正则表达式吗?

解决方案

这不是您想要使用正则表达式的东西.最好这样做(Python 中的示例):

<预><代码>>>>白名单 = [一"、二"、三"]>>>blacklist = [四"，五"]>>>texts = ["三二一", "一三二六", "一二", "一二三四"]>>>对于文本中的文本:... mytext = text.split()... if all(word in mytext for word in whitelist) and \...不是任何(我文本中的单词黑名单中的单词):... 打印(文本)...三二一一三二六>>>

你可以做到这一点:

^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\b五\b)

^ 将搜索锚定在字符串的开头.
(?=...) 保证它的内容可以从当前位置开始匹配
(?!...) 确保其内容无法从当前位置匹配
\bone\b 匹配 one 但不匹配 lonely.

所以你得到:

<预><代码>>>>进口重新>>>r = re.compile(r"^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\b5\b)")>>>对于文本中的文本:...如果 r.match(text):... 打印(文本)...三二一一三二六

I guess it'll be easier to understand what I'm trying to achieve with this example:

Let's say we have this whitelist: one two three. And this blacklist: four five. Then:

three one two is a matching text (contains all whitelist words);
one three two six is a matching text (contains all whitelist words);
two one is not a matching text (lacks a whitelist word three);
one four two three is not a matching text (contains a blacklist word four).

Could anyone help me out with a regex for this case?

解决方案

This is not something you'd want to use a regex for. Better do it like this (example in Python):

>>> whitelist = ["one", "two", "three"]
>>> blacklist = ["four", "five"]
>>> texts = ["three two one", "one three two six", "one two", "one two three four"]
>>> for text in texts:
...     mytext = text.split()
...     if all(word in mytext for word in whitelist) and \
...        not any(word in mytext for word in blacklist):
...         print(text)
...
three two one
one three two six
>>>

You can do it, though:

^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\bfive\b)

^ anchors the search at the start of the string.
(?=...) ensures that its contents can be matched from the current position
(?!...) ensures that its contents can't be matched from the current position
\bone\b matches one but not lonely.

So you get:

>>> import re
>>> r = re.compile(r"^(?=.*\bone\b)(?=.*\btwo\b)(?=.*\bthree\b)(?!.*\bfour\b)(?!.*\bfive\b)")
>>> for text in texts:
...     if r.match(text):
...         print(text)
...
three two one
one three two six

这篇关于仅当文本包含白名单中的所有单词，但不包含黑名单中的所有单词时才匹配文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

仅当文本包含白名单中的所有单词，但不包含黑名单中的所有单词时才匹配文本 [英] Match text only if it contains all words from whitelist, but none from blacklist

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

仅当文本包含白名单中的所有单词，但不包含黑名单中的所有单词时才匹配文本 [英] Match text only if it contains all words from whitelist, but none from blacklist

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭