根据公共后缀列表从URL中提取注册域 [英] Extract registered domain from URL based on Public Suffix List

查看:158
本文介绍了根据公共后缀列表从URL中提取注册域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个URL,我如何使用公共后缀列表(有效TLD列表,例如此列表)?

Given a URL, how do I extract the registered domain using the Public Suffix List (list of effective TLDs, e.g. this list)?

例如,考虑a.bg是有效的公共后缀:

For instance, considering a.bg is a valid public suffix:

http://www.test.start.a.bg/hello.html -> start.a.bg 
http://test.start.a.bg/               -> start.a.bg
http://test.start.abc.bg/             -> abc.bg (.bg is the public suffix)

这不能使用简单的字符串操作来完成,因为公共后缀可以由多个级别组成,具体取决于TLD.

This cannot be done using simple string manipulation because the public suffix can consist of multiple levels depending on the TLD.

P.S.读取列表(数据库或平面文件)的方式无关紧要,但是列表应该可以在本地访问,因此我并不总是依赖于外部服务.

P.S. It doesn't matter how I read the list (database or flat file), but the list should be accessible locally so I'm not always dependent on external services.

推荐答案

您可以使用parse_url()提取主机名,然后使用

You can use parse_url() to extract the hostname, then use the library provided by regdom to determine the registered domain name (dn + eTLD). For example:

require_once("effectiveTLDs.inc.php");
require_once("regDomain.inc.php");

$url =  'http://www.metu.edu.tr/dhasjkdas/sadsdds/sdda/sdads.html';
echo getRegisteredDomain(parse_url($url, PHP_URL_HOST));

这将打印出metu.edu.tr.

我尝试过的其他示例:

http://www.xyz.start.bg/hello   ->   start.bg
http://www.start.a.bg/world     ->   start.a.bg  (a.bg is a listed eTLD)
http://xyz.ma219.metu.edu.tr    ->   metu.edu.tr
http://www.google.com/search    ->   google.com
http://google.co.uk/search?asd  ->   google.co.uk

更新:这些库已移至: https://github.com/leth/注册域php

UPDATE: These libraries have been moved to: https://github.com/leth/registered-domains-php

这篇关于根据公共后缀列表从URL中提取注册域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆