Python ElementTree模块:当使用方法“find”,“findall”时如何忽略XML文件的命名空间来定位匹配元素。 [英] Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

查看:629
本文介绍了Python ElementTree模块:当使用方法“find”,“findall”时如何忽略XML文件的命名空间来定位匹配元素。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



但是,源xml文件(测试)我们使用findall的方法来定位Element xml文件中的一些元素。 .xml)有命名空间。我将截断部分xml文件作为示例:

 <?xml version =1.0encoding =iso-8859-1 >?; 
< XML_HEADER xmlns =http://www.test.com>
< TYPE>更新< / TYPE>
< DATE> 9/26/2012 10:30:34 AM< / DATE>
< COPYRIGHT_NOTICE>保留所有权利< / COPYRIGHT_NOTICE>
< LICENSE> newlicense.htm< / LICENSE>
< DEAL_LEVEL>
< PAID_OFF> N< / PAID_OFF>
< / DEAL_LEVEL>
< / XML_HEADER>

示例python代码如下:

<$从xml.etree导入ElementTree ET
tree = ET.parse(rtest.xml)
el1 = tree.findall(DEAL_LEVEL / PAID_OFF )#返回None
el2 = tree.findall({http://www.test.com} DEAL_LEVEL / {http://www.test.com} PAID_OFF)#返回< Element'{http ://www.test.com} DEAL_LEVEL / PAID_OFF'at 0xb78b90>

尽管它可以工作,因为有一个命名空间{http://www.test.com } $,

在使用find,findall方法时如何忽略名称空间所以呢?

解决方案

不是修改XML文档本身,而是最好解析它,然后修改结果中的标记。这样你就可以处理多个命名空间和命名空间别名:

  from StringIO import StringIO 
import xml.etree.ElementTree as ET

#而不是ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _,el in:
if' }'in el.tag:
el.tag = el.tag.split('}',1)[1]#去掉所有命名空间
root = it.root

这是基于这里的讨论:
http://bugs.python.org/issue18304


I want to use the method of "findall" to locate some elements of the source xml file in the ElementTree module.

However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

The sample python code is below:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

Although it can works, because there is a namespace "{http://www.test.com}", it's very inconvenient to add a namespace in front of each tag.

How can I ignore the namespace when using the method of "find", "findall" and so on?

解决方案

Instead of modifying the XML document itself, it's best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
root = it.root

This is based on the discussion here: http://bugs.python.org/issue18304

这篇关于Python ElementTree模块:当使用方法“find”,“findall”时如何忽略XML文件的命名空间来定位匹配元素。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆