通过“ElementTree"在 Python 中使用命名空间解析 XML [英] Parsing XML with namespace in Python via 'ElementTree'
问题描述
我想使用 Python 的 ElementTree
解析以下 XML:
I have the following XML which I want to parse using Python's ElementTree
:
<rdf:RDF xml:base="http://dbpedia.org/ontology/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://dbpedia.org/ontology/">
<owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
<rdfs:label xml:lang="en">basketball league</rdfs:label>
<rdfs:comment xml:lang="en">
a group of sports teams that compete against each other
in Basketball
</rdfs:comment>
</owl:Class>
</rdf:RDF>
我想找到所有 owl:Class
标签,然后提取其中所有 rdfs:label
实例的值.我正在使用以下代码:
I want to find all owl:Class
tags and then extract the value of all rdfs:label
instances inside them. I am using the following code:
tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')
由于命名空间的原因,我收到以下错误.
Because of the namespace, I am getting the following error.
SyntaxError: prefix 'owl' not found in prefix map
我尝试阅读 http://effbot.org/zone/element-namespaces.htm 上的文档但我仍然无法使其正常工作,因为上述 XML 具有多个嵌套命名空间.
I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.
请告诉我如何更改代码以查找所有 owl:Class
标签.
Kindly let me know how to change the code to find all the owl:Class
tags.
推荐答案
你需要给 .find()
, findall()
和 iterfind()
方法显式命名空间字典:
You need to give the .find()
, findall()
and iterfind()
methods an explicit namespace dictionary:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
前缀仅在您传入的namespaces
参数中查找.这意味着您可以使用任何您喜欢的命名空间前缀;API 拆分了 owl:
部分,在 namespaces
字典中查找相应的命名空间 URL,然后更改搜索以查找 XPath 表达式 {http://www.w3.org/2002/07/owl}Class
代替.当然,您也可以自己使用相同的语法:
Prefixes are only looked up in the namespaces
parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl:
part, looks up the corresponding namespace URL in the namespaces
dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class
instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
另见
如果您可以切换到lxml
库,情况会更好;该库支持相同的 ElementTree API,但在元素的 .nsmap
属性中为您收集命名空间,并且通常具有出色的命名空间支持.
If you can switch to the lxml
library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap
attribute on elements and generally has superior namespaces support.
这篇关于通过“ElementTree"在 Python 中使用命名空间解析 XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!