从给定的URL中提取主域名 [英] Extract main domain name from a given url

查看:358
本文介绍了从给定的URL中提取主域名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下内容从网址中提取域:(它们是测试用例)

I used the following to extract the domain from a url: (They are test cases)

String regex = "^(ww[a-zA-Z0-9-]{0,}\\.)";
ArrayList<String> cases = new ArrayList<String>();
cases.add("www.google.com");
cases.add("ww.socialrating.it");
cases.add("www-01.hopperspot.com");
cases.add("wwwsupernatural-brasil.blogspot.com");
cases.add("xtop10.net");
cases.add("zoyanailpolish.blogspot.com");

for (String t : cases) {  
    String res = t.replaceAll(regex, "");  
}

我可以得到以下结果:

google.com
hopperspot.com
socialrating.it
blogspot.com
xtop10.net
zoyanailpolish.blogspot.com

前四种情况都不错。最后一个不好。我想要的是: blogspot.com 最后一个,但它给出了 zoyanailpolish.blogspot.com 。我做错了什么?

The first four cases are good. The last one is not good. What I want is: blogspot.com for the last one, but it gives zoyanailpolish.blogspot.com. What am I doing wrong?

推荐答案

正如BalusC和其他人所建议的那样,最实际的解决方案是获取TLD列表(请参阅此列表),将其保存到文件中,加载它们,然后确定TLD是什么由给定的url字符串使用。从那里你可以构成主域名,如下所示:

As suggested by BalusC and others the most practical solution would be to get a list of TLDs (see this list), save them to a file, load them and then determine what TLD is being used by a given url String. From there on you could constitute the main domain name as follows:

    String url = "zoyanailpolish.blogspot.com";

    String tld = findTLD( url ); // To be implemented. Add to helper class ?

    url = url.replace( "." + tld,"");  

    int pos = url.lastIndexOf('.');

    String mainDomain = "";

    if (pos > 0 && pos < url.length() - 1) {
        mainDomain = url.substring(pos + 1) + "." + tld;
    }
    // else: Main domain name comes out empty

实施细节留给你。

这篇关于从给定的URL中提取主域名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆