我如何用Jsoup解析这个HTML [英] How do I parse this HTML with Jsoup
问题描述
继承HTML: http://我现在的代码是:
ArrayList< String> arrayList = new ArrayList< String>();
Document doc = null;
尝试{
doc = Jsoup.connect(http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp .scps =范围%3A%28ALL%29&安培; frbg =安培;标签= default_tab&安培; dstmp = 1332103973502&安培; SRT =秩&安培; CT =搜索和安培;模式=基本&安培;达姆=真安培; INDX = 1&安培; TB = T&安培; VL(freeText0 )=拖拉机&安培; FN =搜索&安培; VID = ABN_VU1\" )得到();
元素标题= doc.select(h2.EXLResultTitle span);
for(Element src:heading){
String j = src.text();
System.out.println(j); //检查进入数组
arrayList.add(j);
}
我将如何提取了解您的拖拉机和壳牌石油公司。 ?感谢您的帮助!
您的选择器只会选择< span>
元素,它们在< h2 class =EXLResultTitle>
中,而实际上 需要那些< h2> ;
元素自己。因此,只需从选择器中移除 span
即可:
元素标题= doc 。选择( h2.EXLResultTitle); (元素标题:标题){
System.out.println(heading.text());
}
您应该能够为< ; h3 class =EXLResultAuthor>
你自己根据学到的经验。 b $ b
I am trying to extract "Know your tractor" and "Shell Petroleum Company.1955"? Bear in mind that that is just a snippet of the whole code and there are more then one H2/H3 tag. And I would like to get the data from all the H2 and H3 tags.
Heres the HTML: http://i.stack.imgur.com/Pif3B.png
The Code I have just now is:
ArrayList<String> arrayList = new ArrayList<String>();
Document doc = null;
try{
doc = Jsoup.connect("http://primo.abdn.ac.uk:1701/primo_library/libweb/action/search.do?dscnt=0&scp.scps=scope%3A%28ALL%29&frbg=&tab=default_tab&dstmp=1332103973502&srt=rank&ct=search&mode=Basic&dum=true&indx=1&tb=t&vl(freeText0)=tractor&fn=search&vid=ABN_VU1").get();
Elements heading = doc.select("h2.EXLResultTitle span");
for (Element src : heading) {
String j = src.text();
System.out.println(j); //check whats going into the array
arrayList.add(j);
}
How would I extract "Know your tractor" and "Shell Petroleum Company.1955"? Thanks for your help!
Your selector only selects <span>
elements which are inside <h2 class="EXLResultTitle">
, while you actually need those <h2>
elements themself. So, just remove span
from the selector:
Elements headings = doc.select("h2.EXLResultTitle");
for (Element heading : headings) {
System.out.println(heading.text());
}
You should be able to figure the selector for <h3 class="EXLResultAuthor">
yourself based on the lesson learnt.
See also:
这篇关于我如何用Jsoup解析这个HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!