正则表达式以匹配具有可选"www"和协议的URL [英] regex to match a URL with optional 'www' and protocol
问题描述
我正在尝试编写正则表达式.
I'm trying to write a regexp.
一些背景信息:我试图查看我网站URL的REQUEST_URI是否包含另一个URL.像这样:
some background info: I am try to see if the REQUEST_URI of my website's URL contains another URL. like these:
- http://mywebsite.com/ google.com/search=xyz
但是,URL不会始终包含"http"或"www".因此该模式还应该匹配以下字符串:
However, the url wont always contain the 'http' or the 'www'. so the pattern should also match strings like:
- http://mywebsite.com/ yahoo.org/search=xyz
- http://mywebsite.com/ www.yahoo.org/search=xyz
- http://mywebsite.com/ msn.co.uk'
- http://mywebsite.com/ http://msn.co.uk '
- http://mywebsite.com/yahoo.org/search=xyz
- http://mywebsite.com/www.yahoo.org/search=xyz
- http://mywebsite.com/msn.co.uk'
- http://mywebsite.com/http://msn.co.uk'
有一堆正则表达式可以匹配网址,但是我发现没有一个可以对http和www进行可选匹配.
there are a bunch of regexps out there to match urls but none I have found do an optional match on the http and www.
我想知道匹配的模式是否可能像这样:
i'm wondering if the pattern to match could be something like:
^([a-z]).(com | ca | org | etc)(.)
我认为也许另一种选择是也许只匹配其中带有点(.)的任何字符串. (因为我应用程序中的其他REQUEST_URI通常不包含点)
I thought maybe another option was to perhaps just match any string that had a dot (.) in it. (as the other REQUEST_URI's in my application typically won't contain dots)
这对任何人有意义吗? 我真的很感谢能帮助我阻止我的项目数周的帮助.
Does this make sense to anyone? I'd really appreciate some help with this its been blocking my project for weeks.
非常感谢您 -蒂姆(Tim)
Thanks you very much -Tim
推荐答案
我建议使用一种简单的方法,本质上是基于您所说的内容,只是任何带有点的内容,但也要使用正斜杠.捕获所有内容,不要错过不寻常的URL.像这样:
I suggest using a simple approach, essentially building on what you said, just anything with a dot in it, but working with the forward slashes too. To capture everything and not miss unusual URLs. So something like:
^((?:https?:\/\/)?[^./]+(?:\.[^./]+)+(?:\/.*)?)$
其内容为:
- 可选的http://或https://
- 非点或正斜杠字符
- 一组或多组点后跟非点或正斜杠字符
- 可选的正斜杠及其后的任何内容
将整个内容捕获到第一个分组中.
Capturing the whole thing to the first grouping.
它会匹配,例如:
-
nic.uk
-
nic.uk/
-
http://nic.uk
-
http://nic.uk/
-
https://example.com/test/?a=bcd
nic.uk
nic.uk/
http://nic.uk
http://nic.uk/
https://example.com/test/?a=bcd
验证它们是否为有效URL是另一回事了!它也将匹配:
Verifying they are valid URLs is another story! It would also match:
-
index.php
不匹配:
-
directory/index.php
最小匹配基本上是something.something
,其中没有正斜杠,除非它在点后至少有一个字符.因此,请确保不要将其他任何格式都使用该格式.
The minimal match is basically something.something
, with no forward slash in it, unless it comes at least one character past the dot. So just be sure not to use that format for anything else.
这篇关于正则表达式以匹配具有可选"www"和协议的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!