使用python，LXML&从网站上的表格中提取信息. XPATH [英] Extracting information from a table on a website using python, LXML & XPATH

查看：138 发布时间：2020/5/4 8:39:11 python python-2.7 parsing xpath lxml

本文介绍了使用python，LXML&从网站上的表格中提取信息. XPATH的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

经过大量的努力，我设法从该网站的表格中提取了一些我需要的信息:

I managed after lots of hard work to extract some information that i needed from a table from this website:

http://gbgfotboll.se/serier/?scr=table&ftid= 57108

我从"Kommande Matcher"表(第二张表)中提取了日期和球队名称.

From the table "Kommande Matcher"(second table) I managed to extract the date and the team names.

但是现在我完全想尝试从第一个表中提取内容:

But now i am totally stuck trying to extract from the first table:

第一列滞后"

The first column "Lag"

第二列"S"

6h列"GM-IM"

6h column "GM-IM"

最后一列"P"

有什么想法吗? ，谢谢

Any ideas? , Thanks

推荐答案

我刚刚做到了:

from io import BytesIO
import urllib2 as net
from lxml import etree
import lxml.html    

request = net.Request("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
response = net.urlopen(request)
data = response.read()

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse(BytesIO(data))
#all table rows    
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/table[1]/tbody/tr')

for row in rows:
    columns = row.findall("td")
    collected.append((
        columns[0].find("a").text.encode("utf8"), # Lag
        columns[1].text, # S
        columns[5].text, # GM-IM
        columns[7].text, # P - last column
    ))

for i in collected: print i

您可以直接在lxml.html.parse()中传递URL，而不用调用urllib2.另外，您还可以按类属性来获取目标表，如下所示:

You could to pass URL in lxml.html.parse() directly rather than call urllib2. Also, you'd grab target table by class attribute, like this:

# new version
from lxml import etree
import lxml.html    

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=table&ftid=57108")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval("""//div[@id="content-primary"]/table[
    contains(concat(" ", @class, " "), " clTblStandings ")]/tbody/tr""")

for row in rows:
    columns = row.findall("td")
    collected.append((
        columns[0].find("a").text.encode("utf8"), # Lag
        columns[1].text, # S
        columns[5].text, # GM-IM
        columns[7].text, # P - last column
    ))

for i in collected: print i

这篇关于使用python，LXML&从网站上的表格中提取信息. XPATH的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用python，LXML&从网站上的表格中提取信息. XPATH [英] Extracting information from a table on a website using python, LXML & XPATH

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python，LXML&amp;从网站上的表格中提取信息. XPATH [英] Extracting information from a table on a website using python, LXML &amp; XPATH

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用python，LXML&从网站上的表格中提取信息. XPATH [英] Extracting information from a table on a website using python, LXML & XPATH

登录关闭