如何JSOUP具有多个表的页面 [英] How to JSOUP page with multiple tables

查看:92
本文介绍了如何JSOUP具有多个表的页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何用刮多个表的网页,你知道吗?
我连接到网页

这是一个表,但同一网页上有多个表

我也无法弄清楚如何读取表...

XML:

 < P>< A HREF =/?fantasy_news /功能/ ID = 49818><强> 300强总体幻想排名< / STRONG>< /一个与GT;&下; / p>
< D​​IV CLASS =storyStats>
<表>
<&THEAD GT;
&所述; TR>
<第i RANK< /第i
<第i个中心及LT; /第i
<第i个小组和LT; /第i
<第i个POS&L​​T; /第i
<第i GP< /第i
<第i G< /第i
百分位>一种< /第i
<第i PTS< /第i
<第i + / - < /第i
<第i PIM< /第i
<第i PPP< /第i
< / TR>
< / THEAD>
<&TBODY GT;
&所述; TR类=BG1>
< TD> 1< / TD>
< TD>< A HREF =/ NHL /团队/播放机/名称=史蒂芬+ stamkos?>史蒂芬&安培; NBSP; Stamkos< / A>< / TD>< TD>坦帕湾< / TD>
&所述; TD>℃下; / TD>
< TD ALIGN =右> 81 LT; / TD>
&所述; TD对齐=右→50&下; / TD>
< TD ALIGN =右> 51 LT; / TD>
< TD ALIGN =右> 101 LT; / TD>
< TD ALIGN =右> -2 LT; / TD>
< TD ALIGN =右> 56 LT; / TD>
< TD ALIGN =右> 38 LT; / TD>
< / TR>
迭代&所述;组件> trSIter = doc.select(表)
            .iterator();
    而(trSIter.hasNext()){
        元素TREL = trSIter.next()子(0);
        元件tdEls = trEl.children();
        迭代&所述;组件> 。tdIter = tdEls.select(TR)迭代();
        的System.out.println(>&所述1为卤素;&下;>&下;+ tdIter);
        布尔FIRSTROW = TRUE;
        而(tdIter.hasNext()){            元件TR =(元)tdIter.next();
            而(tdIter.hasNext()){
                INT tdCount = 1;
                元件TDEL = tdIter.next();
                //名称= tdEl.getElementsByClass(playertablePlayerName)得到(0)的.text()。                元素tdsEls = tdEl.select(TD);
                的System.out.println(> 2>&下;>&下;+ tdsEls);
                迭代&所述;组件> columnIt = tdsEls.iterator();                而(columnIt.hasNext()){                    要素列= columnIt.next();
                    开关(tdCount ++){
                    情况1:
                        名称= column.select(A)第一()文本()。                        打破;
                    案例2:
                        STAT2 = Double.parseDouble(column.text());
                        打破;
                    案例3:
                        STAT3 = Double.parseDouble(column.text());
                        打破;
                    情况4:
                        STAT4 = Double.parseDouble(column.text());
                        打破;
                    情况5:
                        STAT5 = Double.parseDouble(column.text());
                        打破;
                    情况6:
                        STAT6 = Double.parseDouble(column.text());
                        打破;
                    案例7:
                        stat7 = Double.parseDouble(column.text());
                        打破;
                    案例8:
                        stat8 = Double.parseDouble(column.text());
                        打破;


解决方案

这应该让你开始。每个表都有一个空白记录,你将不得不考虑。您还可以找出你想和他们都在表,统计。你与 tds.get的统计数据()。让我知道它是如何为你工作。

 文档的DOC = Jsoup.connect(http://www.tsn.ca/fantasy_news/feature/?ID=49815)获得();    对于(单元表:doc.select(div.storyStats)中进行选择(表)){
        对(件行:表格。选取(TR)){
            元素TDS = row.select(TD);
            如果(tds.size()大于0){
                的System.out.println(tds.get(1)的.text()+:+ tds.get(5)的.text());
            }
        }
    }

Any idea on how to scrape a web page with multiple tables? I am connecting to the web page

This is one table but on the same web page there are multiple tables

I also cant figure out how to read the table...

XML:

    <p><a href="/fantasy_news/feature/?ID=49818"><strong>Top 300 Overall Fantasy Rankings</strong></a></p> 
<div class="storyStats"> 
<table> 
<thead> 
<tr> 
<th>RANK</th> 
<th>CENTRES</th> 
<th>TEAM</th> 
<th>POS</th> 
<th>GP</th> 
<th>G</th> 
<th>A</th> 
<th>PTS</th> 
<th>+/-</th> 
<th>PIM</th> 
<th>PPP</th> 
</tr> 
</thead> 
<tbody> 
<tr class="bg1"> 
<td>1.</td> 
<td><a href="/nhl/teams/players/?name=steven+stamkos">Steven&nbsp;Stamkos</a></td> 

<td>Tampa Bay</td> 
<td>C</td> 
<td align="right">81</td> 
<td align="right">50</td> 
<td align="right">51</td> 
<td align="right">101</td> 
<td align="right">-2</td> 
<td align="right">56</td> 
<td align="right">38</td> 
</tr> 


Iterator<Element> trSIter = doc.select("table")
            .iterator();
    while (trSIter.hasNext()) {
        Element trEl = trSIter.next().child(0);
        Elements tdEls = trEl.children();
        Iterator<Element> tdIter = tdEls.select("tr").iterator();
        System.out.println("><1><><"+tdIter);
        boolean firstRow = true;
        while (tdIter.hasNext()) {

            Element tr = (Element) tdIter.next();


            while (tdIter.hasNext()) {
                int tdCount = 1;
                Element tdEl = tdIter.next();
                //name = tdEl.getElementsByClass("playertablePlayerName").get(0).text();

                Elements tdsEls = tdEl.select("td");
                System.out.println("><2><><"+tdsEls);
                Iterator<Element> columnIt = tdsEls.iterator();

                while (columnIt.hasNext()) {

                    Element column = columnIt.next();
                    switch (tdCount++) {
                    case 1:
                        name =column.select("a").first().text();

                        break;
                    case 2:
                        stat2 = Double.parseDouble(column.text());
                        break;
                    case 3:
                        stat3 = Double.parseDouble(column.text());
                        break;
                    case 4:
                        stat4 = Double.parseDouble(column.text());
                        break;
                    case 5:
                        stat5 = Double.parseDouble(column.text());
                        break;
                    case 6:
                        stat6 = Double.parseDouble(column.text());
                        break;
                    case 7:
                        stat7 = Double.parseDouble(column.text());
                        break;
                    case 8:
                        stat8 = Double.parseDouble(column.text());
                        break;

解决方案

This should get you started. Each table has a blank record you will have to account for. You will also have to figure out which stats you want and where they are in the table. You get the stats with tds.get(). Let me know how it works for you.

    Document doc = Jsoup.connect("http://www.tsn.ca/fantasy_news/feature/?ID=49815").get();

    for (Element table : doc.select("div.storyStats").select("table")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 0) {
                System.out.println(tds.get(1).text() + ":" + tds.get(5).text());
            }
        }
    }

这篇关于如何JSOUP具有多个表的页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆