我如何用Jsoup解析这个HTML [英] How do I parse this HTML with Jsoup

查看：80 发布时间：2018/6/21 12:17:24 java html jsoup

本文介绍了我如何用Jsoup解析这个HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图提取了解你的拖拉机和壳牌石油公司1955？请记住，这只是整个代码的一部分，并且有多个H2 / H3标签。我想从所有H2和H3标签中获取数据。

  ArrayList< String> arrayList = new ArrayList< String>（）; 
 Document doc = null; 
尝试{
 
 doc = Jsoup.connect（http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp .scps =范围％3A％28ALL％29&安培; frbg =安培;标签= default_tab&安培; dstmp = 1332103973502&安培; SRT =秩&安培; CT =搜索和安培;模式=基本&安培;达姆=真安培; INDX = 1&安培; TB = T&安培; VL（freeText0 ）=拖拉机&安培; FN =搜索&安培; VID = ABN_VU1\" ）得到（）; 
元素标题= doc.select（h2.EXLResultTitle span）; 
 
 for（Element src：heading）{
 String j = src.text（）; 
 System.out.println（j）; //检查进入数组
 arrayList.add（j）; 
}

我将如何提取了解您的拖拉机和壳牌石油公司。？感谢您的帮助！

解决方案

您的选择器只会选择< span> 元素，它们在< h2 class =EXLResultTitle> 中，而实际上需要那些< h2> ; 元素自己。因此，只需从选择器中移除 span 即可：

 元素标题= doc 。选择（ h2.EXLResultTitle）; （元素标题：标题）{
 System.out.println（heading.text（））; 
 
 
}

您应该能够为< ; h3 class =EXLResultAuthor> 你自己根据学到的经验。 b $ b

Jsoup食谱 - CSS选择器

jsoup 选择器 API文档

I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.

Heres the HTML: http://i.stack.imgur.com/Pif3B.png

The Code I have just now is:
ArrayList<String> arrayList = new ArrayList<String>(); Document doc = null; try{ doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get(); Elements heading = doc.select("h2.EXLResultTitle span"); for (Element src : heading) { String j = src.text(); System.out.println(j); //check whats going into the array arrayList.add(j); }
How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!
解决方案
Your selector only selects <span> elements which are inside <h2 class="EXLResultTitle">, while you actually need those <h2> elements themself. So, just remove span from the selector:
Elements headings = doc.select("h2.EXLResultTitle"); for (Element heading : headings) { System.out.println(heading.text()); }
You should be able to figure the selector for <h3 class="EXLResultAuthor"> yourself based on the lesson learnt.

See also:

Jsoup cookbook - CSS selectors

Jsoup Selector API documentation

这篇关于我如何用Jsoup解析这个HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何用Jsoup解析这个HTML [英] How do I parse this HTML with Jsoup

问题描述

See also:

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

我如何用Jsoup解析这个HTML [英] How do I parse this HTML with Jsoup

问题描述

See also:

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭