从url提取域(包括硬盘) [英] Extract domain from url (including the hard ones)

查看:155
本文介绍了从url提取域(包括硬盘)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试(或者只是找到一个现有的)PHP方法,它可以链接并提取url。诀窍在于,它需要承受奇怪的域名的重量,如:

I'm trying to write (or just find an existing) PHP method that can take a link and extract the url. The trick is, it needs to hold under the weight of strange looking domains like:

www.champa.kku.ac.th 

我用人的眼睛看着这个人,我还是猜测不正确:认为域将是 kku.ac.th 但是访问时会出现dns错误。

Looking at this one myself with human eyes, I still guessed it incorrectly: thought the domain would be kku.ac.th but that gives a dns error when visiting.

所以任何人都知道从url可靠地提取域名的好方法:

So anyone knows of a good way to reliably extract the domain from url:

http://site.com/hello.php
http://site.com.uk/hello.php
http://subdomain.site.com/hello.php
http://subdomain.site.com.uk/hello.php
http://www.champa.kku.ac.th/hello.php // and even the one I couldn't tell


推荐答案

PHP具有 parse_url()功能,将帮助您将基本拆分成协议,主机,端口等。

PHP has the parse_url() function that will help you do the basic splitting into protocol, host, port, and so on.

为了在不确定的情况下提取正确的域名,这是非常难以告诉的,因为有时候,两部分TLD是TLD权限(例如在英国)的措施,有时候是私人企业(例如 .uk.com )。我想你不会在维护名单上列出两个部分,如

As to extracting the "right" domain in uncertain cases, this is extremely hard to tell because sometimes, "two-part TLDs" are a measure by the TLD authority (e.g. in the UK) and sometimes are private enterprises (e.g. .uk.com). I think you won't get around maintaining lists of top level domains that have two parts like


  • .co.uk

  • .ac.uk

  • .ac.th

将被视为TLD(顶级级域),吞咽第二部分。

those endings would be treated like TLDs (Top level domains), swallowing the second part.

这是可靠地将两部分TLD分开的唯一方法,如 .co.uk - 其中 server1.ibm.co.uk (需要删除两部分 .co.uk 以确定域名本身)从 server1.ibm.com (其中 .com 需要删除的常规子域)。

This is the only way of reliably telling apart "two-part TLDs" like .co.uk - where server1.ibm.co.uk (where the two-part .co.uk needs to be removed to determine the domain itself) from regular sub-domains like server1.ibm.com (where .com needs to be removed).

获取许多重要的两部分TLD列表的一个好的起点是在speednames.com的域名搜索(在国家中选择全部)。可以找到更完整的列表,如 Ruby domainatrix库的一部分

A good starting point to get a list of many important "two-part TLDs" is the domain search at speednames.com (select "all" in countries). A more complete list can be found as part of the Ruby domainatrix library.

这篇关于从url提取域(包括硬盘)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆