使用Jsoup从特定类获取所有href值 [英] Use Jsoup to get all href values from a specific class

查看:630
本文介绍了使用Jsoup从特定类获取所有href值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析我的大学网站,以从主站点获取新闻列表(标题+链接).但是,当我尝试解析一个完整的网站时,我正在寻找的链接嵌套在其他类,表等的深处.这是我尝试使用的代码:

I was trying to parse my university website, to get a list of news (title + link) from main site. However, as I'm trying to parse a full website, links that I am looking for are nested deep in other classes, tables etc. Here's the code I tried to use:

String url = "http://www.portal.pwr.wroc.pl/index,241.dhtml";
    Document doc = Jsoup.connect(url).get();
    Elements links = doc.select("table.cwrapper .tbody .tr td.ccol2 div.cwrapper_padd div#box_main_page_news.cbox.grey div#dyn_main_news.cbox.padd2 div.nitem table.nitemt .tbody .tr td.nitemcell2 span.title_1");
    ArrayList <String> listOfLinks = new ArrayList <String> ();
    int counter = 0;


    for (Element link : links) {

        listOfLinks.add(link.text());

    }

但是它不起作用.如果将每个链接都放在以下位置,是否有更好的方法来获取所有这些链接的href值和标题:

But it doesn't work. Is there a better way to get a href values and titles of all those links, if every one of them is placed in:

<span class = "title_1">
    <a href="Link Adress">Link Title</a>
</span>

也许是某种循环,它将遍历所有这些标签并从中获取值?

Maybe some kind of loop, that would iterate over all of those tags, taking values from them?

感谢帮助:-)

推荐答案

您的主要问题是,您正在查找的信息并不存在于您使用的URL上,而是位于

Your main problem is that the information you're looking for, does not exist at the URL you're using, but at http://www.portal.pwr.wroc.pl/box_main_page_news,241.dhtml?limit=10.
You should first get that page, and than use this (it's a combination of Hovercraft and Andrei volgon's answers) -

String url = "http://www.portal.pwr.wroc.pl/box_main_page_news,241.dhtml?limit=10";
String baseURL = "http://www.portal.pwr.wroc.pl/";
Document doc = Jsoup.connect(url).get();
Elements links = doc.select(".title_1 > a");
for (Element link : links) {
    System.out.println("Title - " + link.text());
    System.out.println(baseURL + link.attr("href"));
}

这篇关于使用Jsoup从特定类获取所有href值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆