正则表达式匹配除某些 URL 之外的所有 URL [英] Regex to match all URLs except certain URLs

查看:29
本文介绍了正则表达式匹配除某些 URL 之外的所有 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要匹配所有有效的 URL,除了:

I need to match all valid URLs except:

http://www.w3.org
http://w3.org/foo
http://www.tempuri.org/foo

通常,除某些域外的所有 URL.

Generally, all URLs except certain domains.

这是我目前所拥有的:

https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?  

将匹配足够接近我的需要的 URL(但绝不是所有有效的 URL!)(感谢,http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/!)

will match URLs that are close enough to my needs (but in no way all valid URLs!) (thanks, http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/!)

https?://www\.(?!tempuri|w3)\S*

将匹配所有带有 www. 的 URL,但不在 tempuriw3 域中.

will match all URLs with www., but not in the tempuri or w3 domain.

我真的很想要

https?://([-\w\.]+)(?!tempuri|w3)\S*

工作,但很快,它似乎选择了所有 http:// 字符串.

to work, but afaick, it seems to select all http:// strings.

啊,我应该在乔姆斯基等级更高的地方做这件事!

Gah, I should just do this in something higher up the Chomsky hierarchy!

推荐答案

以下正则表达式:

https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*

仅匹配以下摘录中的前四行:

https://ok1.url.com
http://ok2.url.com
https://not.ok.tempuri.com
http://not-ok.either.w3.com

http://no1.w3.org
http://no2.w3.org
http://tempuri.bla.com
http://no4.tempuri.bla
http://no3.tempuri.org
http://w3.org/foo
http://www.tempuri.org/foo

我知道你在想什么,答案是为了匹配上面的列表并且只返回前两行,你必须使用以下正则表达式:>

I know what you're thinking, and the answer is that in order to match the above list and only return the first two lines you'd have to use the following regular expression:

https?://(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)([-\w]*\.)(?!w3|tempuri)\S*

事实上,这只不过是对第一个正则表达式的轻微修改,其中

which, in truth, is nothing more than a slight modification of the first regular expression, where the

(?!w3|tempuri)([-\w]*\.)

部分连续出现两次.

您的正则表达式不起作用的原因是因为当您包含 .在 ()* 内,这意味着它不仅可以匹配 this.this.this. 还可以匹配 this.this.th - in换句话说,它不一定以点结尾,所以它会强制它在任何必须结束的地方结束,以便表达式匹配.在正则表达式测试器中尝试一下,你就会明白我的意思.

The reason why your regular expression wasn't working was because when you include . inside the ()* then that means it can not only match this. and this.this. but also this.this.th - in other words, it doesn't necessarily end in a dot, so it will force it to end wherever it has to so that the expression matches. Try it out in a regular expression tester and you'll see what I mean.

这篇关于正则表达式匹配除某些 URL 之外的所有 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆