使用ElementTree从混合元素xml标签获取文本 [英] Get text from mixed element xml tags with ElementTree

查看：275 发布时间：2020/10/28 20:54:25 python xml elementtree

本文介绍了使用ElementTree从混合元素xml标签获取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用ElementTree来解析我拥有的XML文档。我从 u 标记中获取文本。其中一些包含混合内容，我需要过滤掉这些内容或将其保留为文本。我有两个示例：

I'm using ElementTree to parse an XML document that I have. I am getting the text from the u tags. Some of them have mixed content that I need to filter out or keep as text. Two examples that I have are:

<u>
   <vocal type="filler">
     <desc>eh</desc>
   </vocal>¿Sí? 
</u>

<u>Pues... 
   <vocal type="non-ling">
     <desc>laugh</desc>
   </vocal>A mí no me suena. 
</u>

如果类型是 filler，我想在vocal标签中获取文本，但如果它的类型为 non-ling 则不是。

I want to get the text within the vocal tag if it's type is filler but not if it's type is non-ling.

如果我遍历 u 的孩子，那么最后的文本总是会丢失。我可以达到的唯一方法是使用 itertext（）。但是，这样就失去了检查人声标签类型的机会。

If I iterate through the children of u, somehow the last text bit is always lost. The only way that I can reach it is by using itertext(). But then the chance to check the type of the vocal tag is lost.

如何解析它，以便得到如下结果：

How can I parse it so that I get a result like this:

eh ¿Sí? 
Pues... A mí no me suena.

推荐答案

丢失的文本位¿？和我不喝酒。可以作为每个< vocal> 元素的 tail 属性（文本

The lost text bits, "¿Sí?" and "A mí no me suena.", are available as the tail property of each <vocal> element (the text following the element's end tag).

这是获取所需输出的一种方法（使用Python 2.7测试）。

Here is a way to get the wanted output (tested with Python 2.7).

假设vocal.xml看起来像这样：

Assume that vocal.xml looks like this:

<root>
  <u>
    <vocal type="filler">
      <desc>eh</desc>
    </vocal>¿Sí? 
  </u>

  <u>Pues... 
     <vocal type="non-ling">
       <desc>laugh</desc>
     </vocal>A mí no me suena. 
  </u>
</root>

代码：

from xml.etree import ElementTree as ET

root = ET.parse("vocal.xml") 

for u in root.findall(".//u"):
    v = u.find("vocal")

    if v.get("type") == "filler":
        frags = [u.text, v.findtext("desc"), v.tail]
    else:
        frags = [u.text, v.tail]

    print " ".join(t.encode("utf-8").strip() for t in frags).strip()

输出：

eh ¿Sí?
Pues... A mí no me suena.

这篇关于使用ElementTree从混合元素xml标签获取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用ElementTree从混合元素xml标签获取文本 [英] Get text from mixed element xml tags with ElementTree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用ElementTree从混合元素xml标签获取文本 [英] Get text from mixed element xml tags with ElementTree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭