如何解析具有多个表的页面 [英] How to parse a page with multiple tables

查看:59
本文介绍了如何解析具有多个表的页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于如何抓取具有多个表的网页的任何想法? 我正在连接到网页

Any idea on how to scrape a web page with multiple tables? I am connecting to the web page

这是一个表,但是在同一网页上有多个表

This is one table but on the same web page there are multiple tables

我也想不通如何阅读表格...

I also cant figure out how to read the table...

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

推荐答案

这应该可以帮助您入门.每个表都有一个空白记录,您将不得不考虑.您还必须找出所需的统计信息以及它们在表中的位置.您可以通过tds.get()获得统计信息.让我知道它如何为您工作.

This should get you started. Each table has a blank record you will have to account for. You will also have to figure out which stats you want and where they are in the table. You get the stats with tds.get(). Let me know how it works for you.

    Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();

    for (Element table : doc.select("div.storyStats").select("table")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 0) {
                System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
            }
        }
    }

这篇关于如何解析具有多个表的页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆