使用Matcher提取URL域名 [英] Using Matcher to extract URL domain name

查看:17
本文介绍了使用Matcher提取URL域名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

    static String AdrPattern="http://www.([^&]+)\\.com\\.*";
    static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
    static Matcher WebUrlMatcher;
                WebUrlMatcher = WebUrlPattern.matcher ("keyword");
                if(WebUrlMatcher.matches())
             String  extractedPath = WebUrlMatcher.group (1);

考虑到上述代码,我的目标是从 URL 中提取域名并忽略其余部分.但麻烦的是,首先,如果 URL 有更深的路径,它不会忽略它,其次,它不适用于所有带有 .com 扩展名的 URL.

Considering above codes, My aim is to extract the domain name from the URL and dismiss the rest. But the trouble is that, first of all, if the URL has deeper path, it will not ignore it and second, it does not work for all URL with .com extension.

例如,如果 URL 为 http://www.lego.com/en-us/technic/?domainredir=technic.lego,则结果将不是 legolego.com/en-us/technic/?domainredir=technic.lego.

For example, if the URL is http://www.lego.com/en-us/technic/?domainredir=technic.lego, the result will not be lego but lego.com/en-us/technic/?domainredir=technic.lego.

推荐答案

使用

static String AdrPattern="http://www\\.([^&]+)\\.com.*";
                                    ^^              ^

您转义了最后一个点,它被视为文字,matches 无法匹配整个字符串.此外,第一个点必须被转义.

You escaped the final dot, and it was treated as a literal, and matches could not match the entire string. Also, the first dot must be escaped.

此外,为了使正则表达式更加严格,您可以将 [^&]+ 替换为 [^/&].

Also, to make the regex a bit more strict, you can replace the [^&]+ with [^/&].

更新:

static String AdrPattern="http://www\\.([^/&]+)\\.com/([^/]+)/([^/]+)/([^/]+).*";
static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
static Matcher WebUrlMatcher = WebUrlPattern.matcher("http://www.lego.com/en-us/technic/?domainredir=technic.lego");
if(WebUrlMatcher.matches()) {
    String  extractedPath = WebUrlMatcher.group(1);
    String  extractedPart1 = WebUrlMatcher.group(2);
    String  extractedPart2 = WebUrlMatcher.group(3);
    String  extractedPart3 = WebUrlMatcher.group(4);
}

或者,使用 \G:

static String AdrPattern="(?:http://www\\.([^/&]+)\\.com/|(?!^)\\G)/?([^/]+)";
static String AdrPattern="http://www\\.([^/&]+)\\.com/([^/]+)/([^/]+)/([^/]+)";
static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
static Matcher WebUrlMatcher = WebUrlPattern.matcher("http://www.lego.com/en-us/technic/?domainredir=technic.lego");
int cnt = 0;
while(WebUrlMatcher.find()) {
    if (cnt == 0) {
       String extractedPath = WebUrlMatcher.group(1);
       String extractedPart = WebUrlMatcher.group(2);
       cnt = cnt + 1;
    }
    else {
       String extractedPart = WebUrlMatcher.group(2);
    }
}

这篇关于使用Matcher提取URL域名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆