我可以改进这个正则表达式检查有效域名吗? [英] Can I improve this regex check for valid domain names?

查看:16
本文介绍了我可以改进这个正则表达式检查有效域名吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我一直在研究这个域名正则表达式.到目前为止,它似乎选择了带有 SLD 和 TLD(带有可选的 ccTLD)的域名,但存在重复的 TLD 列表.这可以进一步重构吗?

params[:domain_name].downcase.strip.match(/^[a-z0-9-]{2,63}.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|工作)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|me|mil|mobi|博物馆)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|旅行)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])(.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|工作)|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|mil|mobi|博物馆)|(n[acefgilopruz]|名称|网络)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|旅行)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))?$/)

解决方案

请,请,请不要使用固定且极其复杂的正则表达式来匹配已知域名.>

TLD 列表不是静态的,特别是 ICANN 正在寻找新 gTLD 的简化流程.甚至 ccTLD 列表有时也会更改!

查看 http://publicsuffix.org/ 提供的列表并编写一些代码下载并解析该列表.

So, I have been working on this domain name regular expression. So far, it seems to pick up domain names with SLDs and TLDs (with the optional ccTLD), but there is duplication of the TLD listing. Can this be refactored any further?

params[:domain_name].downcase.strip.match(/^[a-z0-9-]{2,63}
.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|
(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|
(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|
(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|
(m[acdghklmnopqrstuvwxyz]|me|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|
(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|
(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])
(.((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|
(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|
(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|
(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|
m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|
(n[acefgilopruz]|name|net)|(om|org)|
(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|
(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))?$/)

解决方案

Please, please, please don't use a fixed and horribly complicated regex like this to match for known domain names.

The list of TLDs is not static, particularly with ICANN looking at a streamlined process for new gTLDs. Even the list of ccTLDs changes sometimes!

Have a look at the list available from http://publicsuffix.org/ and write some code that's able to download and parse that list instead.

这篇关于我可以改进这个正则表达式检查有效域名吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆