如何从python中的URL获取域名(名称+ TLD) [英] How to get the domainname (name+TLD) from a URL in python

查看:145
本文介绍了如何从python中的URL获取域名(名称+ TLD)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从格式可能不同的 URL 列表中提取域名(站点名称 + TLD).例如:当前状态---->我想要的

mail.yahoo.com------>雅虎网account.hotmail.co.uk---->hotmail.co.ukx.it--->x.itgoogle.mail.com--->google.com

是否有任何 python 代码可以帮助我从 URL 中提取我想要的内容,还是应该手动执行?

解决方案

这有点重要,因为没有简单的规则来确定有效公共后缀(站点名称 + TLD)的构成要素.相反,公共后缀是在 PublicSuffix.org 上作为列表维护.

存在查询该列表的python包(本地存储);它被称为 publicsuffix:

<预><代码>>>>从 publicsuffix 导入 PublicSuffixList>>>psl = PublicSuffixList()>>>打印 psl.get_public_suffix('mail.yahoo.com')雅虎网>>>打印 psl.get_public_suffix('account.hotmail.co.uk')hotmail.co.uk

I want to extract the domain name(name of the site+TLD) from a list of URLs which may vary in their format. for instance: Current state---->what I want

mail.yahoo.com------> yahoo.com
account.hotmail.co.uk---->hotmail.co.uk
x.it--->x.it
google.mail.com---> google.com

Is there any python code that can help me with extracting what I want from URL or should I do it manually?

解决方案

This is somewhat non-trivial, as there is no simple rule to determine what makes a for a valid public suffix (site name + TLD). Instead, what makes a public suffix is maintained as a list at PublicSuffix.org.

A python package exists that queries that list (stored locally); it's called publicsuffix:

>>> from publicsuffix import PublicSuffixList
>>> psl = PublicSuffixList()
>>> print psl.get_public_suffix('mail.yahoo.com')
yahoo.com
>>> print psl.get_public_suffix('account.hotmail.co.uk')
hotmail.co.uk

这篇关于如何从python中的URL获取域名(名称+ TLD)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆