XPATH - 与很多孩子的HTML [英] XPATH - html with a lot of children

查看：172 发布时间：2018/6/26 21:16:47 python html python-2.7 xpath web-scraping

本文介绍了XPATH - 与很多孩子的HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑页变量中的html。

如何访问 td ？

我想像 xpath（/ table / tr / td / text（））那样访问它们。

我不想指出其他 tr s

xpath（'.// table / tr / tr / tr / td / text（）'）也不起作用。

Python代码：

  import __future__ $ lxml中的b $ b导入html 
导入请求
 from bs4 import BeautifulSoup 
 
 page =
<！DOCTYPE html> 
< html lang =en> 
< head> ; 
< meta charset =UTF-8> 
< title> cv< / title> 
< / head> 
< body> 
 
< table> 
< tr> 
< tr> 
< tr> 
< td> table1 td1< / td> 
< td> table1 td2< / td> 
< / tr> 
< / tr> 
< / tr> 
< / table> 
 
< table> 
< tr> 
< tr> 
< tr> 
< td> table2 td1< / td> 
< td> table2 td2< / td> 
< / tr> 
< / tr> 
< / tr> 
< / table> 
 
< table> 
< tr> 
< tr> 
< tr> 
< td> table3 td1< / td> 
< td> table3 td2< / td> 
< / tr> 
< / tr> 
< / tr> 
< / table> 
< / body> 
< / html> 

 
汤= str（BeautifulSoup（page，'html.parser'））
 tree = html.fromstring（汤）
 
 things = tree.xpath（'.// table / tr / tr / tr / td / text（）'）
 
 print（things）
 
 for things in things： 
 print（thing）
 
 print（'That's all'）

解决方案

使用xpath // td / text（）：

  things = tree.xpath（'// td / text（）'）

// td 代表find any <$

适用于我。

打印 td 元素按照表格：
分组：

<$ p $对于doc.xpath中的table_elm（//表），
：
打印另一个表
things = table_elm .xpath（'.// td / text（）'）
print（things）

请注意，在这个CAS中e是xpath重要的。。

Consider the html in the page variable.

How do I access the tds ?

I want to access them like xpath("/table/tr/td/text())"

I don't want to indicate the other trs

Unfortunately this expression xpath('.//table/tr/tr/tr/td/text()') doesn't work either.

Python code:
import __future__ from lxml import html import requests from bs4 import BeautifulSoup page = """ <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>cv</title> </head> <body> <table> <tr> <tr> <tr> <td>table1 td1</td> <td>table1 td2</td> </tr> </tr> </tr> </table> <table> <tr> <tr> <tr> <td>table2 td1</td> <td>table2 td2</td> </tr> </tr> </tr> </table> <table> <tr> <tr> <tr> <td>table3 td1</td> <td>table3 td2</td> </tr> </tr> </tr> </table> </body> </html> """ soup = str(BeautifulSoup(page, 'html.parser')) tree = html.fromstring(soup) things = tree.xpath('.//table/tr/tr/tr/td/text()') print(things) for thing in things: print(thing) print('That's all')
I want it from the root!
解决方案
Use xpath //td/text():
things = tree.xpath('//td/text()')
The //td stands for "find any td element in any depth.

Works for me.

Printing td elements grouped per table:

doc = html.fromstring(page) for table_elm in doc.xpath("//table"): print "another table" things = table_elm.xpath('.//td/text()') print(things)
Note, that in this case is the . in xpath significant.

这篇关于XPATH - 与很多孩子的HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

XPATH - 与很多孩子的HTML [英] XPATH - html with a lot of children

问题描述

打印 `td` 元素按照`表格`：

Printing `td` elements grouped per `table`:

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

XPATH - 与很多孩子的HTML [英] XPATH - html with a lot of children

问题描述

打印 td 元素按照表格：

Printing td elements grouped per table:

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

打印 `td` 元素按照`表格`：

Printing `td` elements grouped per `table`:

登录关闭