lxml xpath表达式,用于选择给定子节点(包括其子节点)下的所有文本 [英] lxml xpath expression for selecting all text under a given child node including his children
本文介绍了lxml xpath表达式,用于选择给定子节点(包括其子节点)下的所有文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
提供的XML如下:
<node1>
<text title='book'>
<div chapter='0'>
<div id='theNode'>
<p xml:id="40">
A House that has:
<p xml:id="45">- a window;</p>
<p xml:id="46">- a door</p>
<p xml:id="46">- a door</p>
its a beuatiful house
</p>
</div>
</div>
</text>
</node1>
我想找到文本标题,并从出现在文本内的第一个p标签中获取所有文本标题书节点
I would like to locate text title and get all the text from the first p tag appearing inside the text title book node
到目前为止我知道:
from lxml import etree
XML_tree = etree.fromstring(XML_content,parser=parser)
text = XML_tree.xpath('//text[@title="book"]/div/div/p/text()')
得到:一所房子是美丽的房子
gets: "A house that has is a beautiful house"
但我也希望所有可能出现的孩子和第一个
的大孩子的文字基本上都出现在
But I would like also all the text of all the possible children and great children of the first
appearing under
下;先寻找,然后寻找第一个
,然后将所有嵌套的p标记下的所有文本都给我。
basically; look for then look for the first
and give me all the text under that p tag whatever the nesting.
伪代码:
text = XML_tree.xpath('//text[@title="book"]/... any number of nodes.../p/ ....all text under p')
谢谢。
推荐答案
尝试使用 string()
或 normalize-space()
...
Try using either string()
or normalize-space()
...
from lxml import etree
XML_content = """
<node1>
<text title='book'>
<div chapter='0'>
<div id='theNode'>
<p xml:id="x40">
A House that has:
<p xml:id="x45">- a window;</p>
<p xml:id="x46">- a door</p>
<p xml:id="x47">- a door</p>
its a beuatiful house
</p>
</div>
</div>
</text>
</node1>
"""
XML_tree = etree.fromstring(XML_content)
text = XML_tree.xpath('string(//text[@title="book"]/div/div/p)')
# text = XML_tree.xpath('normalize-space(//text[@title="book"]/div/div/p)')
print(text)
使用 string()
输出...
A House that has:
- a window;
- a door
- a door
its a beuatiful house
输出使用 normalize-space()
...
A House that has: - a window; - a door - a door its a beuatiful house
这篇关于lxml xpath表达式,用于选择给定子节点(包括其子节点)下的所有文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文