正则表达式匹配Domain.CCTLD [英] Regex to match Domain.CCTLD
问题描述
docs.google.com
不匹配,但 google.com
没有。然而,这样的东西就像 .co.uk
,CCTLD这样的东西变得复杂。有没有人知道解决方案?感谢提前。 编辑:我已经意识到我也必须处理多个子域,如 john。 doe.google.co.uk
。现在需要一个解决方案:P。
根据上面的评论,我将重新解释这个问题 - - 而不是使正确表达式与它们匹配,我们将创建一个匹配它们的函数,并应用该函数过滤域名列表,以仅包含第一类域,例如google.com,amazon.co.uk。
首先,我们需要一个TLD列表。正如格雷格所说,公众后缀列表是一个很好的开始。假设您已将列表解析为一个名为后缀
的python数组。如果这不是你舒服的,评论,我可以添加一些代码,将做到这一点。
suffixes = parse_suffix_list(suffix_list.txt)
现在我们需要一些代码来标识一个给定的域名是否匹配模式some-name.suffix:
def is_domain(d):
后缀中的后缀:
如果d.endswith(后缀):
#获取基本域名,不带后缀
base_name = d [0 :-( suffix.length + 1)]
#如果它包含'。',它是一个子域。
如果不是base_name.contains('。'):
返回true
#如果我们到达这里,没有找到匹配
return false
Does anyone know a regular expression to match Domain.CCTLD? I don't want subdomains, only the "atomic domain". For example, docs.google.com
doesn't get matched, but google.com
does. However, this gets complicated with stuff like .co.uk
, CCTLDs. Does anyone know a solution? Thanks in advance.
EDIT: I've realized I also have to deal with multiple subdomains, like john.doe.google.co.uk
. Need a solution now more than ever :P.
Based on your comment above, I'm going to reinterpret the question -- rather than making a regex that will match them, we'll create a function that will match them, and apply that function to filter a list of domain names to only include first class domains, e.g. google.com, amazon.co.uk.
First, we'll need a list of TLDs. As Greg mentioned, the public suffix list is a great place to start. Let's assume you've parsed the list into a python array called suffixes
. If this isn't something your comfortable with, comment and I can add some code that will do it.
suffixes = parse_suffix_list("suffix_list.txt")
Now we'll need code that identifies whether a given domain name matches the pattern some-name.suffix:
def is_domain(d):
for suffix in suffixes:
if d.endswith(suffix):
# Get the base domain name without suffix
base_name = d[0:-(suffix.length + 1)]
# If it contains '.', it's a subdomain.
if not base_name.contains('.'):
return true
# If we get here, no matches were found
return false
这篇关于正则表达式匹配Domain.CCTLD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!