使用 XPath 提取标签之间的文本，包括标记 [英] Extract text between tags with XPath including markup

查看：73 发布时间：2021/10/2 19:36:03 python xpath

本文介绍了使用 XPath 提取标签之间的文本，包括标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下一段 XML:

...<span class="st">在 Tim <em>Power</em>: Politieman...</span>...

我想提取标签之间的部分.为此，我使用 XPath:

/span[@class="st"]

然而，这将提取所有内容，包括 .和.

/span[@class="st"]/text()

将返回两个文本元素的列表.一个包含在蒂姆".另一个:政治家"... 不包括在内，并像分隔符一样处理.

是否有返回的纯 XPath 解决方案:

在 Tim <em>Power</em>: Politieman...

编辑感谢@helderdarocha 和@TextGeek.使用仅包含 的 XPath 提取纯文本似乎并非易事.

/span[@class="st"]/node() 解决方案创建一个包含各个行的列表，在 Python 中从列表中创建一个字符串是微不足道的.

解决方案

要获取任何子节点，您可以使用:

/span[@class="st"]/node()

这将返回:

两个子文本节点
完整的 节点(元素和内容).

如果您确实想要所有 text() 节点，包括 em 中的节点，则获取所有 text() 后代:

/span[@class="st"]//text()

或

/span[@class="st"]/descendant::text()

这将返回三个文本节点，文本 inside ，而不是 元素.>

I have the following piece of XML:

...<span class="st">In Tim <em>Power</em>: Politieman...</span>...

I want to extract the part between the  tags. For this I use XPath:

   /span[@class="st"]

This however will extract everything including the . and.

  /span[@class="st"]/text()

will return a list of two text elements. One containing "In Tim". The other ":Politieman". The .. is not included and is handled like a separator.

Is there a pure XPath solution which returns:
In Tim Power: Politieman...
EDIT Thanks to @helderdarocha and @TextGeek. Seems non trivial to extract plain text with XPath only including the .

The /span[@class="st"]/node() solution creates a list containing the individual lines, from which it is trivial in Python to create a String.
解决方案
To get any child node you can use:
/span[@class="st"]/node()
This will return:

Two child text nodes

The full  node (element and contents).

If you actually want all the text() nodes, including the ones inside em, then get all the text() descendants:
/span[@class="st"]//text()
or
/span[@class="st"]/descendant::text()
This will return three text nodes, the text inside , but not the  elements.

这篇关于使用 XPath 提取标签之间的文本，包括标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 XPath 提取标签之间的文本，包括标记 [英] Extract text between tags with XPath including markup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 XPath 提取标签之间的文本，包括标记 [英] Extract text between tags with XPath including markup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭