如何解析的表的第三列的单元格? [英] How to parse the cells of the 3rd column of a table?
问题描述
我试图解析&LT的第三列的单元格;表>
使用Jsoup。
I am trying to parse the cells of the 3rd column of a <table>
using Jsoup.
下面是HTML:
<b><table title="Avgångar:" class="tableMenuCell" cellspacing="0" cellpadding="4" border="0" id="GridViewForecasts" style="color:#333333;width:470px;border-collapse:collapse;">
<tr class="darkblue_pane" style="color:White;font-weight:bold;">
<th scope="col">Linje</th>
<th scope="col">Destination</th>
<th scope="col">Nästa tur (min)</th>
<th scope="col"> </th>
<th scope="col">Därefter</th>
<th scope="col"> </th>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga Kurortsv.</td><td align="right">55</td>
<td align="left"></td>
<td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="lightblue_pane" style="color:#284775;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Hovshaga via Resecentrum</td><td align="right">21</td>
<td align="left"></td><td align="right">--</td>
<td align="left"></td>
</tr>
<tr class="white_pane" style="color:#333333;">
<td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
<td align="left">Teleborg</td><td align="right">5</td>
<td align="left"></td><td align="right">45</td><td align="left"></td>
</tr>
</table></b>
下面是我的code尝试将抛出一个 NullPointerException异常
:
Here is my code attempt which throws a NullPointerException
:
URL url = null;
try {
url = new URL("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("1");
Document doc = null;
try {
System.out.println("2");
doc = Jsoup.parse(url, 3000);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("3");
Element table = doc.select("table[title=Avgångar:]").first();
System.out.println("3");
Iterator<Element> it = table.select("td").iterator();
//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
// do what ever you want with the td element here
System.out.println("::::::::::"+it.next());
//iterate three times to get to the next td you want. checking after the first
// one to make sure
// we're not at the end of the table.
it.next();
if(!it.hasNext()){
break;
}
it.next();
it.next();
}
它去,直到第二个的System.out.println(3);
然后stucks
It goes till the second System.Out.Println("3");
and then it stucks.
推荐答案
这个方法是挺乱的,你没有告诉在该行的NPE发生了什么,所以很难给出一个直接回答你的问题。
This approach is quite a mess and you didn't tell anything about at which line the NPE occurred, so it's hard to give a straight answer to your question.
除此之外,我建议不要做硬且容易出错的方式。作为该&LT;表&gt;
已经在 ID
属性,它应该是整个文件独特的,只需要使用ID选择 #someid
。此外,你可以使用该索引选择第3列的单元格:EQ(指数)
(注:这是基于零!)。
Apart from that, I would suggest to not do it the hard and error prone way. As that <table>
has already an id
attribute which is supposed to be unique throughout the document, just use the ID selector #someid
. Further, you can get the cells of the 3rd column using the index selector :eq(index)
(note: it's zero based!).
于是,那几个简单的线条应该这样做:
So, those few of simple lines should do it:
Document document = Jsoup.connect("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)").get();
Elements nextTurns = document.select("#GridViewForecasts td:eq(2)");
for (Element nextTurn : nextTurns) {
System.out.println(nextTurn.text());
}
这导致在这里:
which results here in:
50
30
10
18
3
24
就是这样。
我强烈建议投资一些时间学习正确的CSS选择器语法Jsoup是建立在其周围。
I strongly recommend to invest some time in properly learning the CSS selector syntax as Jsoup is build around it.
- Jsoup CSS selector syntax
- Jsoup
Selector
API - W3 CSS3 selector specification
这篇关于如何解析的表的第三列的单元格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!