如果存在某个单词,则RegEx排除匹配项,但不存在另一个部分单词 [英] RegEx to exclude match if a certain word is present, but not another partial word
问题描述
我有我们的防火墙用来阻止成人站点的关键字"cum",问题在于它的工作原理太好了,因为它还会阻止任何带有"document"一词的URL.
I have the keyword "cum" which our firewall uses to block adult sites, problem is this works a little too well because this also blocks any URL with the word "document"
防火墙将使用正则表达式字符串,而我尝试这样做:
The firewall will take regex strings, and I tried this:
^.*(?!document)cum.*$
请注意,它仍然与文档"匹配.我有一种使用管道|
的感觉,但我不明白.
Vut it still matches "document". I have a feeling I should be using a pipe |
but I don't get it.
我想要在任何地方匹配
*cum*
在URL(或域名)中找到
,但是如果单词是document
或documents
,则找不到.
is found in the URL (or domain-name), but NOT if the word is document
or documents
.
可能吗?据我了解,单词边界在这里是行不通的,因为cum
单词在URL中不一定要用空格隔开,而在域名中则不一定要用空格隔开.
Possible? As I understand it, a word boundary doesn't work here because the word cum
won't necessarily be separated by white-space when it's in a URL, and definitely not if it's in a domain-name.
这里是另一种表达方式:
Here's another way to put it:
Allow "examplesearchdocuments.com"
Allow "examplemydocuments.com"
Allow "documentexample.com"
Allow "example.com/somedocuments"
Don't allow "funnycumsiteexample.com"
Don't allow "cumallovereverythingexample.com"
Don't allow "exampleseemycum.com"
其中,cum
是不正确的单词匹配.很抱歉,如果这些示例中的任何一个都是真实的网站,我不知道该如何传达.
where cum
being the bad word match. Sorry if any of these examples are real sites, I don't know how else to convey this.
推荐答案
根据评论,我错了.
如果在前瞻中使用后退,则只有当"cum"不在"document"一词中时,您才可以匹配它.
If you use a lookbehind inside your lookahead, you can match "cum" only if it is not within the word "document".
cum(?!(?<=docum)ent)
这里是有关环顾四周的一些阅读材料 http://www.regular-expressions.info/lookaround. html
Here is some reading on lookaround http://www.regular-expressions.info/lookaround.html
这里是针对大量测试的.
Here it is against a large number of tests.
http://www.rubular.com/r/b5iZrn6Cjz
这篇关于如果存在某个单词,则RegEx排除匹配项,但不存在另一个部分单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!