XPATH - 与很多孩子的HTML [英] XPATH - html with a lot of children
本文介绍了XPATH - 与很多孩子的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
请考虑页变量中的html。
如何访问 td ?
我想像 xpath(/ table / tr / td / text())那样访问它们。
我不想指出其他 tr s
xpath('.// table / tr / tr / tr / td / text()')
也不起作用。
Python代码:
import __future__ $ lxml中的b $ b导入html
导入请求
from bs4 import BeautifulSoup
page =
<!DOCTYPE html>
< html lang =en>
< head> ;
< meta charset =UTF-8>
< title> cv< / title>
< / head>
< body>
< table>
< tr>
< tr>
< tr>
< td> table1 td1< / td>
< td> table1 td2< / td>
< / tr>
< / tr>
< / tr>
< / table>
< table>
< tr>
< tr>
< tr>
< td> table2 td1< / td>
< td> table2 td2< / td>
< / tr>
< / tr>
< / tr>
< / table>
< table>
< tr>
< tr>
< tr>
< td> table3 td1< / td>
< td> table3 td2< / td>
< / tr>
< / tr>
< / tr>
< / table>
< / body>
< / html>
汤= str(BeautifulSoup(page,'html.parser'))
tree = html.fromstring(汤)
things = tree.xpath('.// table / tr / tr / tr / td / text()')
print(things)
for things in things:
print(thing)
print('That's all')
<
解决方案
使用xpath // td / text()
:
things = tree.xpath('// td / text()')
// td
代表find any <$
适用于我。
打印 td
元素按照表格
:
分组:<$ p $对于doc.xpath中的table_elm(//表),
:
打印另一个表
things = table_elm .xpath('.// td / text()')
print(things)
请注意,在这个CAS中e是xpath重要的。
。
Consider the html in the page variable.
How do I access the tds ?
I want to access them like xpath("/table/tr/td/text())"
I don't want to indicate the other trs
Unfortunately this expression xpath('.//table/tr/tr/tr/td/text()')
doesn't work either.
Python code:
import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup
page = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>cv</title>
</head>
<body>
<table>
<tr>
<tr>
<tr>
<td>table1 td1</td>
<td>table1 td2</td>
</tr>
</tr>
</tr>
</table>
<table>
<tr>
<tr>
<tr>
<td>table2 td1</td>
<td>table2 td2</td>
</tr>
</tr>
</tr>
</table>
<table>
<tr>
<tr>
<tr>
<td>table3 td1</td>
<td>table3 td2</td>
</tr>
</tr>
</tr>
</table>
</body>
</html>
"""
soup = str(BeautifulSoup(page, 'html.parser'))
tree = html.fromstring(soup)
things = tree.xpath('.//table/tr/tr/tr/td/text()')
print(things)
for thing in things:
print(thing)
print('That's all')
I want it from the root!
解决方案
Use xpath //td/text()
:
things = tree.xpath('//td/text()')
The //td
stands for "find any td
element in any depth.
Works for me.
Printing td
elements grouped per table
:
doc = html.fromstring(page)
for table_elm in doc.xpath("//table"):
print "another table"
things = table_elm.xpath('.//td/text()')
print(things)
Note, that in this case is the .
in xpath significant.
这篇关于XPATH - 与很多孩子的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文