通过“ElementTree"在 Python 中使用命名空间解析 XML [英] Parsing XML with namespace in Python via 'ElementTree'

查看:46
本文介绍了通过“ElementTree"在 Python 中使用命名空间解析 XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 Python 的 ElementTree 解析以下 XML:

I have the following XML which I want to parse using Python's ElementTree:

<rdf:RDF xml:base="http://dbpedia.org/ontology/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns="http://dbpedia.org/ontology/">

    <owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
        <rdfs:label xml:lang="en">basketball league</rdfs:label>
        <rdfs:comment xml:lang="en">
          a group of sports teams that compete against each other
          in Basketball
        </rdfs:comment>
    </owl:Class>

</rdf:RDF>

我想找到所有 owl:Class 标签,然后提取其中所有 rdfs:label 实例的值.我正在使用以下代码:

I want to find all owl:Class tags and then extract the value of all rdfs:label instances inside them. I am using the following code:

tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')

由于命名空间的原因,我收到以下错误.

Because of the namespace, I am getting the following error.

SyntaxError: prefix 'owl' not found in prefix map

我尝试阅读 http://effbot.org/zone/element-namespaces.htm 上的文档但我仍然无法使其正常工作,因为上述 XML 具有多个嵌套命名空间.

I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.

请告诉我如何更改代码以查找所有 owl:Class 标签.

Kindly let me know how to change the code to find all the owl:Class tags.

推荐答案

你需要给 .find(), findall()iterfind() 方法显式命名空间字典:

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:

namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed

root.findall('owl:Class', namespaces)

前缀在您传入的namespaces 参数中查找.这意味着您可以使用任何您喜欢的命名空间前缀;API 拆分了 owl: 部分,在 namespaces 字典中查找相应的命名空间 URL,然后更改搜索以查找 XPath 表达式 {http://www.w3.org/2002/07/owl}Class 代替.当然,您也可以自己使用相同的语法:

Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:

root.findall('{http://www.w3.org/2002/07/owl#}Class')

另见Parsing XML with Namespaces 部分.

如果您可以切换到lxml,情况会更好;该库支持相同的 ElementTree API,但在元素的 .nsmap 属性中为您收集命名空间,并且通常具有出色的命名空间支持.

If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

这篇关于通过“ElementTree"在 Python 中使用命名空间解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆