Python元素树-从元素中提取文本，剥离标签 [英] Python element tree - extract text from element, stripping tags

查看：145 发布时间：2020/10/28 20:36:41 python xml-parsing elementtree

本文介绍了Python元素树-从元素中提取文本，剥离标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用Python中的ElementTree，如何从节点中提取所有文本，剥离该元素中的所有标签并仅保留文本？

With ElementTree in Python, how can I extract all the text from a node, stripping any tags in that element and keeping only the text?

例如，说我有以下内容：

For example, say I have the following:

<tag>
  Some <a>example</a> text
</tag>

我想返回某些示例文本。我该怎么做呢？到目前为止，我采用的方法已经带来了灾难性的后果。

I want to return Some example text. How do I go about doing this? So far, the approaches I've taken have had fairly disastrous outcomes.

推荐答案

如果您使用的是Python 3.2+，可以使用 itertext 。

If you are running under Python 3.2+, you can use itertext.

itertext 创建一个文本迭代器它将按文档顺序循环遍历此元素和所有子元素，并返回所有内部文本：

itertext creates a text iterator which loops over this element and all subelements, in document order, and returns all inner text:

import xml.etree.ElementTree as ET
xml = '<tag>Some <a>example</a> text</tag>'
tree = ET.fromstring(xml)
print(''.join(tree.itertext()))

# -> 'Some example text'

如果您在较低版本的Python中运行，则可以重用 itertext（）的实现 ，方法是将其附加到 Element 类，然后可以像上面一样调用它：

If you are running in a lower version of Python, you can reuse the implementation of itertext() by attaching it to the Element class, after which you can call it exactly like above:

# original implementation of .itertext() for Python 2.7
def itertext(self):
    tag = self.tag
    if not isinstance(tag, basestring) and tag is not None:
        return
    if self.text:
        yield self.text
    for e in self:
        for s in e.itertext():
            yield s
        if e.tail:
            yield e.tail

# if necessary, monkey-patch the Element class
if 'itertext' not in ET.Element.__dict__:
    ET.Element.itertext = itertext

xml = '<tag>Some <a>example</a> text</tag>'
tree = ET.fromstring(xml)
print(''.join(tree.itertext()))

# -> 'Some example text'

这篇关于Python元素树-从元素中提取文本，剥离标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python元素树-从元素中提取文本，剥离标签 [英] Python element tree - extract text from element, stripping tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python元素树-从元素中提取文本，剥离标签 [英] Python element tree - extract text from element, stripping tags

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭