获取链接的根域 [英] Get Root Domain of Link

查看：107 发布时间：2017/6/9 20:10:29 python regex dns root

本文介绍了获取链接的根域的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个链接，例如 http://www.techcrunch.com/ ，我想只得到techcrunch.com的链接部分。在python中如何处理？

I have a link such as http://www.techcrunch.com/ and I would like to get just the techcrunch.com part of the link. How do I go about this in python?

推荐答案

使用 urlparse ：

hostname = urlparse.urlparse("http://www.techcrunch.com/").hostname

获取然而，根域会更有问题，因为它没有在句法意义上定义。 www.theregister.co.uk的根域是什么？使用默认域的网络如何？ devbox12可能是一个有效的主机名。

Getting the "root domain", however, is going to be more problematic, because it isn't defined in a syntactic sense. What's the root domain of "www.theregister.co.uk"? How about networks using default domains? "devbox12" could be a valid hostname.

然而，对于最常见的情况，您可以特别处理前者，忽略后者，但是意识到它赢得了' t 100％准确。

For the most common cases, however, you can probably handle the former specially and ignore the latter, but aware that it won't 100% accurate.

hostname = urlparse.urlparse(url).hostname.split(".")
hostname = ".".join(len(hostname[-2]) < 4 and hostname[-3:] or hostname[-2:])

如果下一个到最后一个小于四个字符（例如.com.au，.co.uk）和最后两个部分。

This uses the last three parts if the next-to-last part is less than four characters (e.g. ".com.au", ".co.uk") and the last two parts otherwise.

这篇关于获取链接的根域的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取链接的根域 [英] Get Root Domain of Link

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

获取链接的根域 [英] Get Root Domain of Link

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭