关于使用正则表达式进行 URL 验证的问题 [英] Question about URL Validation with Regex

查看:33
本文介绍了关于使用正则表达式进行 URL 验证的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下正则表达式可以很好地匹配网址:

I have the following regex that does a great job matching urls:

((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)`

但是,它不处理没有前缀的 url,即.stackoverflow.comwww.google.com 不匹配.有谁知道我如何修改这个正则表达式而不关心是否有前缀?

However, it does not handle urls without a prefix, ie. stackoverflow.com or www.google.com do not match. Anyone know how I can modify this regex to not care if there is a prefix or not?

我的问题是不是太含糊了?需要更多细节吗?

Does my question too vague? Does it need more details?

(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\)))?[\w\d:#@%/;$()~_?\+-=\\\.&]*)

我加了一个 ()?围绕像 Vinko Vrsalovic 建议的协议,但现在正则表达式几乎可以匹配任何字符串,只要它具有有效的 URL 字符.

I added a ()? around the protocols like Vinko Vrsalovic suggested, but now the regex will match nearly any string, as long as it has valid URL characters.

我的实现是我有一个管理内容的数据库,它有一个字段,其中包含纯文本、电话号码、URL 或电子邮件地址.我一直在寻找一种简单的方法来验证输入,以便我可以正确格式化它,即.为 url/电子邮件创建锚标记,并格式化电话号码我如何在整个网站上格式化其他号码.有什么建议吗?

My implementation of this is I have a database that I manage the contents, and it has a field that either has plain text, a phone number, a URL or an email address. I was looking for an easy way to validate the input so I can have it properly formatted, ie. creating anchor tags for the url/email, and formatting the phone number how I have the other numbers formatted throughout the site. Any suggestions?

推荐答案

以下正则表达式来自精彩的 掌握正则表达式.如果您不熟悉自由间距/评论模式,建议您熟悉一下与它.

The below regex is from the wonderful Mastering Regular Expressions book. If you are not familiar with the free spacing/comments mode, I suggest you get familiar with it.

\b
# Match the leading part (proto://hostname, or just hostname)
(
    # ftp://, http://, or https:// leading part
    (ftp|https?)://[-\w]+(\.\w[-\w]*)+
  |
    # or, try to find a hostname with our more specific sub-expression
    (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
    # Now ending .com, etc. For these, require lowercase
    (?-i: com\b
        | edu\b
        | biz\b
        | gov\b
        | in(?:t|fo)\b # .int or .info
        | mil\b
        | net\b
        | org\b
        | name\b
        | coop\b
        | aero\b
        | museum\b
        | [a-z][a-z]\b # two-letter country codes
    )
)

# Allow an optional port number
( : \d+ )?

# The rest of the URL is optional, and begins with / . . . 
(
     /
     # The rest are heuristics for what seems to work well
     [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*
     (?:
        [.!,?]+  [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+
     )*
)?

简要解释这个正则表达式(完整的解释,请阅读本书)- URL 有一个或多个点分隔部分,以有限的最后位列表或两个字母的国家/地区代码(.uk .fr ...).此外,部件可能有任何字母数字字符或连字符-",但连字符不能是部件的第一个或最后一个字符.然后可能有一个端口号,然后是其余的.

To explain this regex briefly (for a full explanation get the book) - URLs have one or more dot separated parts ending with either a limited list of final bits, or a two letter country code (.uk .fr ...). In addition the parts may have any alphanumeric characters or hyphens '-', but hyphens may not be the first or last character of the parts. Then there may be a port number, and then the rest of it.

要从网站中提取此内容,请转至 http://regex.info/listing.cgi?ed=3&p=207 来自第 3 版的第 207 页.

To extract this from the website, go to http://regex.info/listing.cgi?ed=3&p=207 It is from page 207 of the 3rd edition.

页面上写着Copyright © 2008 Jeffrey Friedl",所以我不确定使用条件是什么,但我希望如果你拥有这本书,你可以使用它所以......我希望我没有违反规则把它放在这里.

And the page says "Copyright © 2008 Jeffrey Friedl" so I'm not sure what the conditions for use are exactly, but I would expect that if you own the book you could use it so ... I'm hoping I'm not breaking the rules putting it here.

这篇关于关于使用正则表达式进行 URL 验证的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆