我如何用Jsoup解析这个HTML [英] How do I parse this HTML with Jsoup

查看:80
本文介绍了我如何用Jsoup解析这个HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取了解你的拖拉机和壳牌石油公司1955?请记住,这只是整个代码的一部分,并且有多个H2 / H3标签。我想从所有H2和H3标签中获取数据。

继承HTML: http://我现在的代码是:

  ArrayList< String> arrayList = new ArrayList< String>(); 
Document doc = null;
尝试{

doc = Jsoup.connect(http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp .scps =范围%3A%28ALL%29&安培; frbg =安培;标签= default_tab&安培; dstmp = 1332103973502&安培; SRT =秩&安培; CT =搜索和安培;模式=基本&安培;达姆=真安培; INDX = 1&安培; TB = T&安培; VL(freeText0 )=拖拉机&安培; FN =搜索&安培; VID = ABN_VU1\" )得到();
元素标题= doc.select(h2.EXLResultTitle span);

for(Element src:heading){
String j = src.text();
System.out.println(j); //检查进入数组
arrayList.add(j);
}

我将如何提取了解您的拖拉机和壳牌石油公司。 ?感谢您的帮助!

您的选择器只会选择< span> 元素,它们在< h2 class =EXLResultTitle> 中,而实际上 需要那些< h2> ; 元素自己。因此,只需从选择器中移除 span 即可:

 元素标题= doc 。选择( h2.EXLResultTitle); (元素标题:标题){
System.out.println(heading.text());


}

您应该能够为< ; h3 class =EXLResultAuthor> 你自己根据学到的经验。 b $ b


I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.

Heres the HTML: http://i.stack.imgur.com/Pif3B.png

The Code I have just now is:

    ArrayList<String> arrayList  = new ArrayList<String>();
    Document doc = null;
 try{

     doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get();
     Elements heading = doc.select("h2.EXLResultTitle span"); 

     for (Element src : heading) {
            String j = src.text();
            System.out.println(j);  //check whats going into the array 
            arrayList.add(j);
     }

How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!

解决方案

Your selector only selects <span> elements which are inside <h2 class="EXLResultTitle">, while you actually need those <h2> elements themself. So, just remove span from the selector:

Elements headings = doc.select("h2.EXLResultTitle");

for (Element heading : headings) {
    System.out.println(heading.text());
}

You should be able to figure the selector for <h3 class="EXLResultAuthor"> yourself based on the lesson learnt.

See also:

这篇关于我如何用Jsoup解析这个HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆