关于使用正则表达式进行 URL 验证的问题 [英] Question about URL Validation with Regex
问题描述
我有以下正则表达式可以很好地匹配网址:
I have the following regex that does a great job matching urls:
((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)`
但是,它不处理没有前缀的 url,即.stackoverflow.com 或 www.google.com 不匹配.有谁知道我如何修改这个正则表达式而不关心是否有前缀?
However, it does not handle urls without a prefix, ie. stackoverflow.com or www.google.com do not match. Anyone know how I can modify this regex to not care if there is a prefix or not?
我的问题是不是太含糊了?需要更多细节吗?
Does my question too vague? Does it need more details?
(((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\)))?[\w\d:#@%/;$()~_?\+-=\\\.&]*)
我加了一个 ()?围绕像 Vinko Vrsalovic 建议的协议,但现在正则表达式几乎可以匹配任何字符串,只要它具有有效的 URL 字符.
I added a ()? around the protocols like Vinko Vrsalovic suggested, but now the regex will match nearly any string, as long as it has valid URL characters.
我的实现是我有一个管理内容的数据库,它有一个字段,其中包含纯文本、电话号码、URL 或电子邮件地址.我一直在寻找一种简单的方法来验证输入,以便我可以正确格式化它,即.为 url/电子邮件创建锚标记,并格式化电话号码我如何在整个网站上格式化其他号码.有什么建议吗?
My implementation of this is I have a database that I manage the contents, and it has a field that either has plain text, a phone number, a URL or an email address. I was looking for an easy way to validate the input so I can have it properly formatted, ie. creating anchor tags for the url/email, and formatting the phone number how I have the other numbers formatted throughout the site. Any suggestions?
推荐答案
以下正则表达式来自精彩的 掌握正则表达式.如果您不熟悉自由间距/评论模式,建议您熟悉一下与它.
The below regex is from the wonderful Mastering Regular Expressions book. If you are not familiar with the free spacing/comments mode, I suggest you get familiar with it.
\b
# Match the leading part (proto://hostname, or just hostname)
(
# ftp://, http://, or https:// leading part
(ftp|https?)://[-\w]+(\.\w[-\w]*)+
|
# or, try to find a hostname with our more specific sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t|fo)\b # .int or .info
| mil\b
| net\b
| org\b
| name\b
| coop\b
| aero\b
| museum\b
| [a-z][a-z]\b # two-letter country codes
)
)
# Allow an optional port number
( : \d+ )?
# The rest of the URL is optional, and begins with / . . .
(
/
# The rest are heuristics for what seems to work well
[^.!,?;"'<>()\[\]{}\s\x7F-\xFF]*
(?:
[.!,?]+ [^.!,?;"'<>()\[\]{}\s\x7F-\xFF]+
)*
)?
简要解释这个正则表达式(完整的解释,请阅读本书)- URL 有一个或多个点分隔部分,以有限的最后位列表或两个字母的国家/地区代码(.uk .fr ...).此外,部件可能有任何字母数字字符或连字符-",但连字符不能是部件的第一个或最后一个字符.然后可能有一个端口号,然后是其余的.
To explain this regex briefly (for a full explanation get the book) - URLs have one or more dot separated parts ending with either a limited list of final bits, or a two letter country code (.uk .fr ...). In addition the parts may have any alphanumeric characters or hyphens '-', but hyphens may not be the first or last character of the parts. Then there may be a port number, and then the rest of it.
要从网站中提取此内容,请转至 http://regex.info/listing.cgi?ed=3&p=207 来自第 3 版的第 207 页.
To extract this from the website, go to http://regex.info/listing.cgi?ed=3&p=207 It is from page 207 of the 3rd edition.
页面上写着Copyright © 2008 Jeffrey Friedl",所以我不确定使用条件是什么,但我希望如果你拥有这本书,你可以使用它所以......我希望我没有违反规则把它放在这里.
And the page says "Copyright © 2008 Jeffrey Friedl" so I'm not sure what the conditions for use are exactly, but I would expect that if you own the book you could use it so ... I'm hoping I'm not breaking the rules putting it here.
这篇关于关于使用正则表达式进行 URL 验证的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!