正则表达式匹配 Domain.CCTLD [英] Regex to match Domain.CCTLD

查看:52
本文介绍了正则表达式匹配 Domain.CCTLD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道匹配 Domain.CCTLD 的正则表达式吗?我不想要子域,只想要原子域".例如,docs.google.com 不匹配,但 google.com 匹配.但是,这会因 .co.uk、CCTLD 之类的内容而变得复杂.有谁知道解决方案?提前致谢.

Does anyone know a regular expression to match Domain.CCTLD? I don't want subdomains, only the "atomic domain". For example, docs.google.com doesn't get matched, but google.com does. However, this gets complicated with stuff like .co.uk, CCTLDs. Does anyone know a solution? Thanks in advance.

我意识到我还必须处理多个子域,例如 john.doe.google.co.uk.现在比以往任何时候都更需要解决方案:P.

I've realized I also have to deal with multiple subdomains, like john.doe.google.co.uk. Need a solution now more than ever :P.

推荐答案

根据你上面的评论,我将重新解释这个问题——而不是制作一个匹配它们的正则表达式,我们将创建一个函数将匹配它们,并应用该函数来过滤域名列表以仅包含一级域,例如google.com、amazon.co.uk.

Based on your comment above, I'm going to reinterpret the question -- rather than making a regex that will match them, we'll create a function that will match them, and apply that function to filter a list of domain names to only include first class domains, e.g. google.com, amazon.co.uk.

首先,我们需要一个顶级域名列表.正如 Greg 所提到的,公共后缀列表是一个很好的起点.假设您已将列表解析为名为 suffixes 的 Python 数组.如果这不是你喜欢的东西,评论,我可以添加一些代码来做到这一点.

First, we'll need a list of TLDs. As Greg mentioned, the public suffix list is a great place to start. Let's assume you've parsed the list into a python array called suffixes. If this isn't something your comfortable with, comment and I can add some code that will do it.

suffixes = parse_suffix_list("suffix_list.txt")

现在我们需要识别给定域名是否与模式 some-name.suffix 匹配的代码:

Now we'll need code that identifies whether a given domain name matches the pattern some-name.suffix:

def is_domain(d):
    for suffix in suffixes:
        if d.endswith(suffix):
            # Get the base domain name without suffix
            base_name = d[0:-(suffix.length + 1)]
            # If it contains '.', it's a subdomain. 
            if not base_name.contains('.'):
                return true
    # If we get here, no matches were found
    return false

这篇关于正则表达式匹配 Domain.CCTLD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆