获取链接的根域 [英] Get Root Domain of Link

查看:107
本文介绍了获取链接的根域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个链接,例如 http://www.techcrunch.com/ ,我想只得到techcrunch.com的链接部分。在python中如何处理?

I have a link such as http://www.techcrunch.com/ and I would like to get just the techcrunch.com part of the link. How do I go about this in python?

推荐答案

使用 urlparse

hostname = urlparse.urlparse("http://www.techcrunch.com/").hostname

获取然而,根域会更有问题,因为它没有在句法意义上定义。 www.theregister.co.uk的根域是什么?使用默认域的网络如何? devbox12可能是一个有效的主机名。

Getting the "root domain", however, is going to be more problematic, because it isn't defined in a syntactic sense. What's the root domain of "www.theregister.co.uk"? How about networks using default domains? "devbox12" could be a valid hostname.

然而,对于最常见的情况,您可以特别处理前者,忽略后者,但是意识到它赢得了' t 100%准确。

For the most common cases, however, you can probably handle the former specially and ignore the latter, but aware that it won't 100% accurate.

hostname = urlparse.urlparse(url).hostname.split(".")
hostname = ".".join(len(hostname[-2]) < 4 and hostname[-3:] or hostname[-2:])

如果下一个到最后一个小于四个字符(例如.com.au,.co.uk)和最后两个部分。

This uses the last three parts if the next-to-last part is less than four characters (e.g. ".com.au", ".co.uk") and the last two parts otherwise.

这篇关于获取链接的根域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆