从 url 中提取 TLD 并对每个 TLD 文件的域和子域进行排序 [英] Extraction of TLD from urls and sorting domains and subdomains for each TLD file

查看：21 发布时间：2022/1/4 14:21:31 perl url dns tld

本文介绍了从 url 中提取 TLD 并对每个 TLD 文件的域和子域进行排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含数百万个网址的列表.我需要为每个 url 提取 TLD 并为每个 TLD 创建多个文件.例如，收集所有带有 .com 的 url 作为 tld 并将其转储到 1 个文件中，另一个文件 .edu tld 等等.进一步在每个文件中，我必须按域的字母顺序排序，然后按子域等.

I have a list of million urls. I need to extract the TLD for each url and create multiple files for each TLD. For example collect all urls with .com as tld and dump that in 1 file, another file for .edu tld and so on. Further within each file, I have to sort it alphabetically by domains and then by subdomains etc.

谁能给我一个在 perl 中实现它的先机?

Can anyone give me a head start for implementing this in perl?

推荐答案

使用 URI 解析 URL，
使用其host方法获取主机，
使用Domain::PublicSuffix的get_root_domain 解析主机名.
使用 tld 或 suffix 方法获取真实 TLD 或伪 TLD.

Use URI to parse the URL,
Use its host method to get the host,
Use Domain::PublicSuffix's get_root_domain to parse the host name.
Use the tld or suffix method to get the real TLD or the pseudo TLD.

use feature qw( say );

use Domain::PublicSuffix qw( );
use URI                  qw( );

my $dps = Domain::PublicSuffix->new();

for (qw(
   http://www.google.com/
   http://www.google.co.uk/
)) {
   my $url = $_;

   # Treat relative URLs as absolute URLs with missing http://.
   $url = "http://$url" if $url !~ /^w+:/;

   my $host = URI->new($url)->host();
   $host =~ s/.z//;  # D::PS doesn't handle "domain.com.".

   $dps->get_root_domain($host)
      or die $dps->error();

   say $dps->tld();     # com  uk
   say $dps->suffix();  # com  co.uk
}

这篇关于从 url 中提取 TLD 并对每个 TLD 文件的域和子域进行排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 url 中提取 TLD 并对每个 TLD 文件的域和子域进行排序 [英] Extraction of TLD from urls and sorting domains and subdomains for each TLD file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 url 中提取 TLD 并对每个 TLD 文件的域和子域进行排序 [英] Extraction of TLD from urls and sorting domains and subdomains for each TLD file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭