为html表格提取lxml xpath [英] Extracting lxml xpath for html table

查看：136 发布时间：2018/6/13 15:53:33 python html xpath html-table lxml

本文介绍了为html表格提取lxml xpath的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似于以下的html文档：

 < html xmlns =http：//www.w3。 org / 1999 / xhtmlxmlns =http://www.w3.org/1999/xhtml> 
< div id =Symbolsclass =cb> 
< table class =quotes> 
< tr>第>代码< th>第< th>名称< / th> 
< th style =text-align：right;> High< / th> 
< th style =text-align：right;> Low< / th> 
< / tr> 
< tr class =roonclick =location.href ='/ xyz.com/A.htm';风格= 颜色：红; > 
< td>< a href =/ xyz.com/A.htmtitle =Display，A> A< / a>< / td> 
< td> A Inc.< / td> 
< td align =right> 45.44< / td> 
< td align =right> 44.26< / td> 
< tr class =reonclick =location.href ='/ xyz.com/B.htm';风格= 颜色：红; > 
< td>< a href =/ xyz.com/B.htmtitle =Display，B> B< / a>< / td> 
< td> B Inc.< / td> 
< td align =right> 18.29< / td> 
< td align =right> 17.92< / td> 
< / div>< / html>

我需要提取code / name / high / low

我使用了Stack Over Flow中的一个类似示例中的以下代码：

  ############################# 
导入urllib2 
 from lxml import html，etree 
 
 webpg = urllib2.urlopen（http://www.eoddata.com/stocklist/NYSE/A.htm）.read（）
 table = html .string（webpg）
 
 for table.xpath（'// table [@ class =quotes] / tbody / tr'）：
 for row.x './th[position（）> 0] / text（）| ./td[position（）= 1] / a / text（）| ./td[position（）> 1] / text（）' ）：
 print column.strip（），
 print 
 
 ######################## #####

我没有得到任何输出。我必须将第一个循环xpath从 table.xpath（'// table [@class =quotes] / tbody / tr'）

我只是不明白为什么 xpath（' // table [@ class =quotes] / tbody / tr'） not work。解决方案

<您可能正在查看Firebug中的HTML，对吗？当文档不存在时，浏览器将插入隐式标签< tbody> 。 lxml库仅处理原始HTML字符串中的标记。

省略XPath中的 tbody 级别。例如，这可以工作： tree = lxml.html.fromstring（raw_html） tree.xpath（'/ / table [@class =quotes] / tr'） [<元素tr在1014206d0>，<元素tr在101420738>，<元素tr在1014207a0>] I have a html doc similar to following: <html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"> <div id="Symbols" class="cb"> <table class="quotes"> <tr><th>Code</th><th>Name</th> <th style="text-align:right;">High</th> <th style="text-align:right;">Low</th> </tr> <tr class="ro" onclick="location.href='/xyz.com/A.htm';" style="color:red;"> <td><a href="/xyz.com/A.htm" title="Display,A">A</a></td> <td>A Inc.</td> <td align="right">45.44</td> <td align="right">44.26</td> <tr class="re" onclick="location.href='/xyz.com/B.htm';" style="color:red;"> <td><a href="/xyz.com/B.htm" title="Display,B">B</a></td> <td>B Inc.</td> <td align="right">18.29</td> <td align="right">17.92</td> </div></html> I need to extract code/name/high/low information from the table. I used following code from one of the similar examples in Stack Over Flow: ############################# import urllib2 from lxml import html, etree webpg = urllib2.urlopen(http://www.eoddata.com/stocklist/NYSE/A.htm).read() table = html.fromstring(webpg) for row in table.xpath('//table[@class="quotes"]/tbody/tr'): for column in row.xpath('./th[position()>0]/text() | ./td[position()=1]/a/text() | ./td[position()>1]/text()'): print column.strip(), print ############################# I am getting nothing output. I have to change the first loop xpath to table.xpath('//tr') from table.xpath('//table[@class="quotes"]/tbody/tr') I just don't understand why the xpath('//table[@class="quotes"]/tbody/tr') not work. 解决方案 You are probably looking at the HTML in Firebug, correct? The browser will insert the implicit tag <tbody> when it is not present in the document. The lxml library will only process the tags present in the raw HTML string. Omit the tbody level in your XPath. For example, this works: tree = lxml.html.fromstring(raw_html) tree.xpath('//table[@class="quotes"]/tr') [<Element tr at 1014206d0>, <Element tr at 101420738>, <Element tr at 1014207a0>] 这篇关于为html表格提取lxml xpath的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

为html表格提取lxml xpath [英] Extracting lxml xpath for html table

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

为html表格提取lxml xpath [英] Extracting lxml xpath for html table

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭