正则表达式以匹配具有可选"www"和协议的URL [英] regex to match a URL with optional 'www' and protocol

查看:122
本文介绍了正则表达式以匹配具有可选"www"和协议的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写正则表达式.

I'm trying to write a regexp.

一些背景信息:我试图查看我网站URL的REQUEST_URI是否包含另一个URL.像这样:

some background info: I am try to see if the REQUEST_URI of my website's URL contains another URL. like these:

但是,URL不会始终包含"http"或"www".因此该模式还应该匹配以下字符串:

However, the url wont always contain the 'http' or the 'www'. so the pattern should also match strings like:

  • http://mywebsite.com/yahoo.org/search=xyz
  • http://mywebsite.com/www.yahoo.org/search=xyz
  • http://mywebsite.com/msn.co.uk'
  • http://mywebsite.com/http://msn.co.uk'

有一堆正则表达式可以匹配网址,但是我发现没有一个可以对http和www进行可选匹配.

there are a bunch of regexps out there to match urls but none I have found do an optional match on the http and www.

我想知道匹配的模式是否可能像这样:

i'm wondering if the pattern to match could be something like:

^([a-z]).(com | ca | org | etc)(.)

我认为也许另一种选择是也许只匹配其中带有点(.)的任何字符串. (因为我应用程序中的其他REQUEST_URI通常不包含点)

I thought maybe another option was to perhaps just match any string that had a dot (.) in it. (as the other REQUEST_URI's in my application typically won't contain dots)

这对任何人有意义吗? 我真的很感谢能帮助我阻止我的项目数周的帮助.

Does this make sense to anyone? I'd really appreciate some help with this its been blocking my project for weeks.

非常感谢您 -蒂姆(Tim)

Thanks you very much -Tim

推荐答案

我建议使用一种简单的方法,本质上是基于您所说的内容,只是任何带有点的内容,但也要使用正斜杠.捕获所有内容,不要错过不寻常的URL.像这样:

I suggest using a simple approach, essentially building on what you said, just anything with a dot in it, but working with the forward slashes too. To capture everything and not miss unusual URLs. So something like:

^((?:https?:\/\/)?[^./]+(?:\.[^./]+)+(?:\/.*)?)$

其内容为:

  • 可选的http://或https://
  • 非点或正斜杠字符
  • 一组或多组点后跟非点或正斜杠字符
  • 可选的正斜杠及其后的任何内容

将整个内容捕获到第一个分组中.

Capturing the whole thing to the first grouping.

它会匹配,例如:

  • nic.uk
  • nic.uk/
  • http://nic.uk
  • http://nic.uk/
  • https://example.com/test/?a=bcd
  • nic.uk
  • nic.uk/
  • http://nic.uk
  • http://nic.uk/
  • https://example.com/test/?a=bcd

验证它们是否为有效URL是另一回事了!它也将匹配:

Verifying they are valid URLs is another story! It would also match:

  • index.php

不匹配:

  • directory/index.php

最小匹配基本上是something.something,其中没有正斜杠,除非它在点后至少有一个字符.因此,请确保不要将其他任何格式都使用该格式.

The minimal match is basically something.something, with no forward slash in it, unless it comes at least one character past the dot. So just be sure not to use that format for anything else.

这篇关于正则表达式以匹配具有可选"www"和协议的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆