使用python读取XML文本的问题 [英] Issue in reading text in XML using python

查看:149
本文介绍了使用python读取XML文本的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试阅读以下具有以下内容的XML文件:

I am trying to read the following XML file which has following content:

<tu creationdate="20100624T160543Z" creationid="SYSTEM" usagecount="0">
    <prop type="x-source-tags">1=A,2=B</prop>
    <prop type="x-target-tags">1=A,2=B</prop>
    <tuv xml:lang="EN">
      <seg>Modified <ut x="1"/>Denver<ut x="2"/> Score</seg>
    </tuv>
    <tuv xml:lang="DE">
      <seg>Modifizierter <ut x="1"/>Denver<ut x="2"/>-Score</seg>
    </tuv>
  </tu>

使用以下代码

tree = ET.parse(tmx)
root = tree.getroot()
seg = root.findall('.//seg')
for n in seg:
   print(n.text)

它给出了以下输出:

Modified
Modifizierter

我期望的是

Modified Denver Score
Modifizierter Denver -Score

有人可以解释为什么只显示部分seg吗?

Can someone explain why only part of seg is displayed?

推荐答案

您需要了解 http://infohost. nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html .

丹佛"是第一个<ut>元素的tail,而得分"是第二个<ut>元素的tail.这些字符串不是<seg>元素的text的一部分.

"Denver" is the tail of the first <ut> element and " Score" is the tail of the second <ut> element. These strings are not part of the text of the <seg> element.

除了kgbplus提供的解决方案(与ElementTree和lxml一起使用)之外,对于lxml,您还可以使用以下方法来获取所需的输出:

In addition to the solution provided by kgbplus (which works with both ElementTree and lxml), with lxml you can also use the following methods to get the wanted output:

  1. xpath()

for n in seg:
    print("".join(n.xpath("text()")))

  • itertext()

  • itertext()

    for n in seg:
        print("".join(n.itertext()))
    

  • 这篇关于使用python读取XML文本的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆