使用java实现Public Suffix提取 [英] implementing Public Suffix extraction using java

查看:128
本文介绍了使用java实现Public Suffix提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要提取网址的顶级域名,我得到了他的 http://publicsuffix.org/index.html

i need to extract the top domain of an url and i got his http://publicsuffix.org/index.html

并且java实现在 http://guava-libraries.googlecode.com 中,我找不到
任何提取域的例子姓名

and the java implementation is in http://guava-libraries.googlecode.com and i could not find any example to extract domain name

say example..
example.google.com
returns google.com

and bing.bing.bing.com
returns bing.com

可以任何人都告诉我如何使用这个库实现一个例子....

can any one tell me how can i implement using this library with an example....

推荐答案

它在我看来像 InternetDomainName.topPrivateDomain( ) 完全你想要什么。 Guava维护一个公共后缀列表(基于mozilla在publicsuffix.org上的列表),用于确定主机的公共后缀部分是什么......顶级私有域是公共后缀加上它的第一个孩子。

It looks to me like InternetDomainName.topPrivateDomain() does exactly what you want. Guava maintains a list of public suffixes (based on Mozilla's list at publicsuffix.org) that it uses to determine what the public suffix part of the host is... the top private domain is the public suffix plus its first child.

以下是一个简单示例:

public class Test {
  public static void main(String[] args) throws URISyntaxException {
    ImmutableList<String> urls = ImmutableList.of(
        "http://example.google.com", "http://google.com", 
        "http://bing.bing.bing.com", "http://www.amazon.co.jp/");
    for (String url : urls) {
      System.out.println(url + " -> " + getTopPrivateDomain(url));
    }
  }

  private static String getTopPrivateDomain(String url) throws URISyntaxException {
    String host = new URI(url).getHost();
    InternetDomainName domainName = InternetDomainName.from(host);
    return domainName.topPrivateDomain().name();
  }
}

运行此代码打印:

http://example.google.com -> google.com
http://google.com -> google.com
http://bing.bing.bing.com -> bing.com
http://www.amazon.co.jp/ -> amazon.co.jp

这篇关于使用java实现Public Suffix提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆