需要正则表达式来捕获二级域(SLD) [英] Need a regular expression to capture second level domain (SLD)

查看:117
本文介绍了需要正则表达式来捕获二级域(SLD)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个正则表达式来捕获给定的URL SLD。

i need a regular expression to capture a given URLs SLD.

示例:

jack.bop.com -> bop
bop.com -> bop
bop.de -> bop
bop.co.uk -> bop
bop.com.br -> bop

所有bops :)。因此,此正则表达式需要忽略ccTLD,gTLD ccSLD。后者是困难的部分,因为我想保持正则表达式尽可能不复杂。

All bops :). So this regex needs to ignore ccTLDs, gTLDs and ccSLDs. The latter is the difficult part, since i wanna keep the regex as un-complex as possible.

首要任务是删除ccTLD然后gTLD,然后检查ccSLD并将其删除(如果有)。

The first task would be to remove ccTLDs then gTLDs, and then check for ccSLDs and remove them if present.

非常感谢任何帮助:)

-

如果有帮助,ccTLD将匹配:

If it helps, ccTLDs are matched by:

\.([a-z]{2})$

gTLD匹配:

\.([a-z]{3-6})$

幸运的是,这是两种相互排斥的模式。

Luckily it's two mutually exclusive patterns.

推荐答案

从技术上讲,'。co.uk'是'bop.co.uk'中的二级域名。您似乎要求的是该域名中最高级别的部分,该部分对公众注册开放,并且您想要剥离注册商的域名。

Technically, '.co.uk' is the second level domain in 'bop.co.uk'. What you seem to be asking for is the highest level part of the domain that was open to public registration, and you want to strip off the domain of the registrar.

< a href =https://tools.ietf.org/html/rfc6265#section-5.3\"rel =nofollow noreferrer> RFC6265§5.3调用你不想要后缀的后缀:

RFC 6265 §5.3 calls the suffx that you don't want a "public suffix":


公共后缀是由公共注册管理机构控制的域名,例如com,co.uk 和pvt.k12.wy.us。

A "public suffix" is a domain that is controlled by a public registry, such as "com", "co.uk", and "pvt.k12.wy.us".

Mozilla维护所有已知公共后缀的列表

Mozilla maintains a list of all known public suffixes.

要创建正则表达式,您必须枚举所有的公共后缀。您应该对它们进行排序,以便稍后出现其他元素后缀的元素。一种简单的方法是按降序排序。看起来反转Mozilla的列表也足够了。

To create your regex, you'll have to enumerate all of the public suffixes. You should order them such that elements that are suffixes of other elements to appear later. An easy way to do this is to sort by descending length. It looks like reversing Mozilla's list would also suffice.

之后,正则表达式非常简单:

After that, the regex is pretty straightforward:

(.+\.)?([^.]+)\.(?:<suffixes>)$

其中< suffixes> | 分开后缀列表。它的一部分看起来像:

Where <suffixes> would be the | separated list of suffixes. A piece of it would look something like:

gov\.uk|ac\.uk|co\.uk|com|org|net|us|uk

通过折叠常见方法可以缩短时间-suffixes,虽然这使得正则表达式(以及计算它的过程)变得更加复杂。例如:

There are ways to make this shorter, by collapsing common-suffixes, though this makes the regex (and the process of computing it) much more complex. For example:

(?:gov\.|ac\.|co\.|)uk|com|org|net|us

这篇关于需要正则表达式来捕获二级域(SLD)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆