如何使用jsoup在Wikipedia文章中提取特定链接？ [英] How can I extract specific links in Wikipedia articles using jsoup?

查看：235 发布时间：2018/7/11 17:41:30 java hyperlink jsoup wikipedia extraction

本文介绍了如何使用jsoup在Wikipedia文章中提取特定链接？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一个NLP项目，我需要知道如何提取仅在介绍部分和本wikipidia页面的地理部分中的链接： http://en.wikipedia.org/wiki/Boston

I am doing an NLP project and I need to know how to extract links that only are in the "introduction" section and in the "geography" section of this wikipidia page: http://en.wikipedia.org/wiki/Boston

你能帮忙吗？我？

推荐答案

维基百科并不容易。我并不认为这是优雅的，甚至可以重复使用。

Wikipedia does not make this easy. I don't claim this to be elegant or even very reuseable.

    Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Boston").timeout(5000).get();

    Element intro = doc.body().select("p").first();
    while (intro.tagName().equals("p")) {
        //here you will get an Elements object which you can
        //iterate through to get the links in the intro
        System.out.println(intro.select("a"));
        intro = intro.nextElementSibling();
    }

    for (Element h2 : doc.body().select("h2")) {
        if(h2.select("span").size() == 2) {
            if (h2.select("span").get(1).text().equals("Geography")) {
                Element nextsib = h2.nextElementSibling();
                while (nextsib != null) {
                    if (nextsib.tagName().equals("p")) {
                        //here you will get an Elements object which you
                        //can iterate through to get the links in the 
                        //geography section
                        System.out.println(nextsib.select("a"));
                        nextsib = nextsib.nextElementSibling();
                    } else if (nextsib.tagName().equals("h2")) {
                        nextsib = null;
                    } else {
                        nextsib = nextsib.nextElementSibling();
                    }
                }
            }
        }
    }
}

这篇关于如何使用jsoup在Wikipedia文章中提取特定链接？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用jsoup在Wikipedia文章中提取特定链接？ [英] How can I extract specific links in Wikipedia articles using jsoup?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何使用jsoup在Wikipedia文章中提取特定链接？ [英] How can I extract specific links in Wikipedia articles using jsoup?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭