使用python lxml xpath遍历表中的所有行 [英] Iterate through all the rows in a table using python lxml xpath

查看：561 发布时间：2018/7/6 16:55:22 python xpath web-scraping html-table lxml

本文介绍了使用python lxml xpath遍历表中的所有行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我要从中提取数据的html页面的源代码。

This is the source code of the html page I want to extract data from.

网页： http://gbgfotboll.se/information/?scr=table&ftid=51168 该表位于页面底部

Webpage: http://gbgfotboll.se/information/?scr=table&ftid=51168 The table is at the bottom of the page

     <html>
               <table class="clCommonGrid" cellspacing="0">
                        <thead>
                            <tr>
                                <td colspan="3">Kommande matcher</td>
                            </tr>
                            <tr>
                                <th style="width:1%;">Tid</th>
                                <th style="width:69%;">Match</th>
                                <th style="width:30%;">Arena</th>
                            </tr>
                        </thead>

                        <tbody class="clGrid">

                    <tr class="clTrOdd">
                        <td nowrap="nowrap" class="no-line-through">
                            <span class="matchTid"><span>2014-09-26<!-- br ok --> 19:30</span></span>



                        </td>
                        <td><a href="?scr=result&amp;fmid=2669197">Guldhedens IK - IF Warta</a></td>
                        <td><a href="?scr=venue&amp;faid=847">Guldheden Södra 1 Konstgräs</a> </td>
                    </tr>

                    <tr class="clTrEven">
                        <td nowrap="nowrap" class="no-line-through">
                            <span class="matchTid"><span>2014-09-26<!-- br ok --> 13:00</span></span>



                        </td>
                        <td><a href="?scr=result&amp;fmid=2669176">Romelanda UF - IK Virgo</a></td>
                        <td><a href="?scr=venue&amp;faid=941">Romevi 1 Gräs</a> </td>
                    </tr>

                    <tr class="clTrOdd">
                    <td nowrap="nowrap" class="no-line-through">
                        <span class="matchTid"><span>2014-09-27<!-- br ok --> 13:00</span></span>



                    </td>
                    <td><a href="?scr=result&amp;fmid=2669167">Kode IF - IK Kongahälla</a></td>
                    <td><a href="?scr=venue&amp;faid=912">Kode IP 1 Gräs</a> </td>
                </tr>

                <tr class="clTrEven">
                    <td nowrap="nowrap" class="no-line-through">
                        <span class="matchTid"><span>2014-09-27<!-- br ok --> 14:00</span></span>



                    </td>
                    <td><a href="?scr=result&amp;fmid=2669147">Floda BoIF - Partille IF FK </a></td>
                    <td><a href="?scr=venue&amp;faid=218">Flodala IP 1</a> </td>
                </tr>


                        </tbody>
                </table>
        </html>

现在我有这个代码实际上产生了我想要的结果..

Right now i have this code that actually produces the result that i want..

import lxml.html
url = "http://gbgfotboll.se/information/?scr=table&ftid=51168"
html = lxml.html.parse(url)
for i in range(12):
    xpath1 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[1]/span/span//text()" %(i+1)
    xpath2 = ".//*[@id='content-primary']/table[3]/tbody/tr[%d]/td[2]/a/text()" %(i+1)
    time = html.xpath(xpath1)[1]
    date = html.xpath(xpath1)[0]
    teamName = html.xpath(xpath2)[0]
    if date == '2014-09-27':
        print time, teamName

给出结果：

13 ：00 Romelanda UF - IK Virgo

13:00 Romelanda UF - IK Virgo

13:00 Kode IF - IKKongahälla

13:00 Kode IF - IK Kongahälla

14:00 Floda BoIF - Partille IF FK

现在回答这个问题。我不想使用带有范围的循环，因为它不稳定，行可以在该表中更改，如果超出范围，它将崩溃。所以我的问题是如何以安全的方式迭代。 意味着它将遍历表中可用的所有行。不多也不少。 此外，如果您有任何其他建议使代码更好/更快，请继续。

Now to the question. I don't want to use for loop with range because its not stable, the rows can change in that table and if it goes out of bounds it will crash. So my question is how can I iterate as I do here in a safe way. Meaning it will iterate through all the rows that are available in the table. No more no less. Also if you have any other suggestion making the code better/faster please go ahead.

使用python lxml xpath遍历表中的所有行 [英] Iterate through all the rows in a table using python lxml xpath

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python lxml xpath遍历表中的所有行 [英] Iterate through all the rows in a table using python lxml xpath

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭