lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本 [英] lxml xpath expression for selecting all text under a given child node including his children

查看：126 发布时间：2020/10/1 19:00:41 python xml xpath lxml children

本文介绍了lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

提供的XML如下：

<node1>
    <text title='book'>
       <div chapter='0'>
          <div id='theNode'>
              <p xml:id="40">
               A House that has:
                   <p xml:id="45">- a window;</p>
                   <p xml:id="46">- a door</p>
                   <p xml:id="46">- a door</p>
               its a beuatiful house
               </p>
          </div>
       </div>
    </text>
</node1>

我想找到文本标题，并从出现在文本内的第一个p标签中获取所有文本标题书节点

I would like to locate text title and get all the text from the first p tag appearing inside the text title book node

到目前为止我知道：

from lxml import etree
XML_tree = etree.fromstring(XML_content,parser=parser)
text = XML_tree.xpath('//text[@title="book"]/div/div/p/text()')

得到：一所房子是美丽的房子

gets: "A house that has is a beautiful house"

但我也希望所有可能出现的孩子和第一个

的大孩子的文字基本上都出现在

But I would like also all the text of all the possible children and great children of the first

appearing under

下；先寻找，然后寻找第一个

，然后将所有嵌套的p标记下的所有文本都给我。

basically; look for then look for the first

and give me all the text under that p tag whatever the nesting.

伪代码：

text = XML_tree.xpath('//text[@title="book"]/... any number of nodes.../p/ ....all text under p')

谢谢。

推荐答案

尝试使用 string（） 或 normalize-space（） ...

Try using either string() or normalize-space()...

from lxml import etree

XML_content = """
<node1>
    <text title='book'>
       <div chapter='0'>
          <div id='theNode'>
              <p xml:id="x40">
               A House that has:
                   <p xml:id="x45">- a window;</p>
                   <p xml:id="x46">- a door</p>
                   <p xml:id="x47">- a door</p>
               its a beuatiful house
               </p>
          </div>
       </div>
    </text>
</node1>
"""

XML_tree = etree.fromstring(XML_content)
text = XML_tree.xpath('string(//text[@title="book"]/div/div/p)')
# text = XML_tree.xpath('normalize-space(//text[@title="book"]/div/div/p)')
print(text)

使用 string（）输出...


               A House that has:
                   - a window;
                   - a door
                   - a door
               its a beuatiful house

输出使用 normalize-space（） ...

A House that has: - a window; - a door - a door its a beuatiful house

这篇关于lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本 [英] lxml xpath expression for selecting all text under a given child node including his children

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本 [英] lxml xpath expression for selecting all text under a given child node including his children

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭