获取URL的二级域(java) [英] Get the second level domain of an URL (java)

查看:222
本文介绍了获取URL的二级域(java)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道在java中是否存在用于提取URL中的二级域(SLD)的解析器或库 - 或者未能使用算法或正则表达式来执行相同操作。例如:

  URI uri = new URI(http://www.mydomain.ltd.uk/blah/some/ page.html中); 

String host = uri.getHost();

System.out.println(host);

打印:

  mydomain.ltd.uk 

现在我想要做的是强有力的识别SLD(ltd.uk)组件。任何想法?



编辑:我理想的是寻找一般解决方案,所以我在警察中匹配.uk。英国,。co.ukinbbc.co.uk和.com在amazon.com中。



谢谢

解决方案

不知道你的目的,但是Level Domain对你来说意义不大。您可能需要找到公共后缀,其下方的域名正是您要找的。

Apache Http Component(HttpClient 4)附带了处理此问题的类,

  org.apache .http.impl.cookie.PublicSuffixFilter 
org.apache.http.impl.cookie.PublicSuffixListParser

你需要从这里下载公共后缀列表,



http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1


I am wondering if there is a parser or library in java for extracting the second level domain (SLD) in an URL - or failing that an algo or regex for doing the same. For example:

URI uri = new URI("http://www.mydomain.ltd.uk/blah/some/page.html");

String host = uri.getHost();

System.out.println(host);

which prints:

mydomain.ltd.uk

Now what I'd like to do is robustly identify the SLD ("ltd.uk") component. Any ideas?

Edit: I'm ideally looking for a general solution, so I'd match ".uk" in "police.uk", ".co.uk" in "bbc.co.uk" and ".com" in "amazon.com".

Thanks

解决方案

Don't know your purpose but Second-Level Domain may not mean much to you. You probably need to find public suffix and the domain right below it is what you are looking for.

Apache Http Component (HttpClient 4) comes with classes to handle this,

org.apache.http.impl.cookie.PublicSuffixFilter
org.apache.http.impl.cookie.PublicSuffixListParser

You need to download the public suffix list from here,

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1

这篇关于获取URL的二级域(java)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆