如何解析的表的第三列的单元格? [英] How to parse the cells of the 3rd column of a table?

查看:114
本文介绍了如何解析的表的第三列的单元格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析&LT的第三列的单元格;表> 使用Jsoup。

I am trying to parse the cells of the 3rd column of a <table> using Jsoup.

下面是HTML:

<b><table title="Avgångar:" class="tableMenuCell" cellspacing="0" cellpadding="4" border="0" id="GridViewForecasts" style="color:#333333;width:470px;border-collapse:collapse;">
    <tr class="darkblue_pane" style="color:White;font-weight:bold;">
        <th scope="col">Linje</th>
        <th scope="col">Destination</th>
        <th scope="col">Nästa tur (min)</th>
        <th scope="col">&nbsp;</th>
        <th scope="col">Därefter</th>
        <th scope="col">&nbsp;</th>
    </tr>
    <tr class="white_pane" style="color:#333333;">
        <td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
        <td align="left">Hovshaga Kurortsv.</td><td align="right">55</td>
        <td align="left"></td>
        <td align="right">--</td>
        <td align="left"></td>

    </tr>
    <tr class="lightblue_pane" style="color:#284775;">
        <td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
        <td align="left">Hovshaga via Resecentrum</td><td align="right">21</td>
        <td align="left"></td><td align="right">--</td>
        <td align="left"></td>
    </tr>
    <tr class="white_pane" style="color:#333333;">
        <td align="right" style="color:#000000;background-color:#01AEF0;">1</td>
        <td align="left">Teleborg</td><td align="right">5</td>
        <td align="left"></td><td align="right">45</td><td align="left"></td>
    </tr>
</table></b>

下面是我的code尝试将抛出一个 NullPointerException异常

Here is my code attempt which throws a NullPointerException:

 URL url = null;
try {
    url = new URL("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)");
} catch (MalformedURLException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
System.out.println("1");
Document doc = null;
try {
    System.out.println("2");
    doc = Jsoup.parse(url, 3000);
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
System.out.println("3");
Element table = doc.select("table[title=Avgångar:]").first();
System.out.println("3");
Iterator<Element> it = table.select("td").iterator();

//we know the third td element is where we wanna start so we call .next twice
it.next();
it.next();
while(it.hasNext()){
  // do what ever you want with the td element here
System.out.println("::::::::::"+it.next());
  //iterate three times to get to the next td you want. checking after the first
  // one to make sure
  // we're not at the end of the table.
  it.next();
  if(!it.hasNext()){ 
    break;
  }
  it.next();
  it.next();
}

它去,直到第二个的System.out.println(3); 然后stucks

It goes till the second System.Out.Println("3"); and then it stucks.

推荐答案

这个方法是挺乱的,你没有告诉在该行的NPE发生了什么,所以很难给出一个直接回答你的问题。

This approach is quite a mess and you didn't tell anything about at which line the NPE occurred, so it's hard to give a straight answer to your question.

除此之外,我建议不要做硬且容易出错的方式。作为该&LT;表&gt; 已经在 ID 属性,它应该是整个文件独特的,只需要使用ID选择 #someid 。此外,你可以使用该索引选择第3列的单元格:EQ(指数)(注:这是基于零!)。

Apart from that, I would suggest to not do it the hard and error prone way. As that <table> has already an id attribute which is supposed to be unique throughout the document, just use the ID selector #someid. Further, you can get the cells of the 3rd column using the index selector :eq(index) (note: it's zero based!).

于是,那几个简单的线条应该这样做:

So, those few of simple lines should do it:

Document document = Jsoup.connect("http://wap.nastabuss.se/its4wap/QueryForm.aspx?hpl=Teleborg+C+(V%C3%A4xj%C3%B6)").get();
Elements nextTurns = document.select("#GridViewForecasts td:eq(2)");

for (Element nextTurn : nextTurns) {
    System.out.println(nextTurn.text());
}

这导致在这里:

which results here in:

50
30
10
18
3
24

就是这样。

我强烈建议投资一些时间学习正确的CSS选择器语法Jsoup是建立在其周围。

I strongly recommend to invest some time in properly learning the CSS selector syntax as Jsoup is build around it.

  • Jsoup CSS selector syntax
  • Jsoup Selector API
  • W3 CSS3 selector specification

这篇关于如何解析的表的第三列的单元格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆