用于URL验证的PHP正则表达式,filter_var的权限太高 [英] PHP regex for url validation, filter_var is too permisive
问题描述
首先让我们根据我的要求定义一个"URL".
First lets define a "URL" according to my requirements.
可选的唯一协议是http://
和https://
然后是一个强制性域名,例如stackoverflow.com
then a mandatory domain name like stackoverflow.com
然后选择其余的url组件(path
,query
,hash
,...)
then optionally the rest of url components (path
, query
, hash
, ...)
根据我的要求供参考的有效和无效网址列表
For reference a list of valid and invalid url's according to my requirements
- stackoverflow.com
- stackoverflow.com/questions/ask
- https://stackoverflow.com/questions/ask
- stackoverflow.com
- stackoverflow.com/questions/ask
- https://stackoverflow.com/questions/ask
- http://www.amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155
amazon.com/Computers-Internet-Books/b/ref=bhp_bb0309A_comint2?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=browse&pf_rd_r=0AH7GM29WF81Q72VPFDH&pf_rd_t=101&pf_rd_p=1273387142&pf_rd_i=283155
http://test-site.com (filter_var拒绝!!!带有破折号的域名)
http://test-site.com (filter_var reject this!!! I have domain names with dashes )
- http://www (php filter_var允许这样做,是的,我知道这是
valid
网址) - http://www..des (php filter_var允许这样做)
- 域名中不允许包含任何字符的任何URL
- http://www (php filter_var allow this, yes i know is a
valid
url) - http://www..des (php filter_var allow this)
- Any url with not allowed characters in the domain name
为了完整性,这是我的php版本:5.3.2-1ubuntu4.2
For completeness here is my php version: 5.3.2-1ubuntu4.2
推荐答案
作为起点,您可以使用它用于JS ,但是转换起来很容易它可用于PHP preg_match
.
As a starting point you can use this one, it's for JS, but it's easy to convert it to work for PHP preg_match
.
/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$/i
对于PHP,这应该可以使用:
For PHP should work this one:
$reg = '@^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+[a-z]+$@i';
此正则表达式始终仅验证域部分,但是您可以对此进行处理或在第一个斜杠'/'
(在"://"
之后)拆分网址,并分别验证域部分和休息.
This regexp anyway validates only the domain part, but you can work on this or split the url at the 1st slash '/'
(after "://"
) and validate separately the domain part and the rest.
BTW:它将同时验证"http://www.domain.com.com"
,但这不是错误,因为子域url可能类似于:"http://www.subdomain.domain.com"
并且有效!而且几乎没有方法(或至少没有操作上简便的方法)使用正则表达式来验证正确的域tld ,因为您必须像这样将所有可能的域tld逐一内联到正则表达式中:
BTW: It would validate also "http://www.domain.com.com"
but this is not an error because a subdomain url could be like: "http://www.subdomain.domain.com"
and it's valid! And there is almost no way (or at least no operatively easy way) to validate for proper domain tld with a regex because you would have to write inline into your regex all possible domain tlds ONE BY ONE like this:
/^(https?\://)?(www\.)?([a-z0-9]([a-z0-9]|(\-[a-z0-9]))*\.)+(com|it|net|uk|de)$/i
(例如,最后一个仅验证以.com/.net/.de/.it/.co.uk结尾的域). 新TLD总是会出现,因此您必须调整正则表达式,每当一个新TLD出现时,这都是令人头疼的事!
(this last one for instance would validate only domain ending with .com/.net/.de/.it/.co.uk). New tlds always come out, so you would have to adjust you regex everytimne a new tld comes out, that's a pain in the neck!
这篇关于用于URL验证的PHP正则表达式,filter_var的权限太高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!