通过'ElementTree'在Python中使用名称空间解析XML [英] Parsing XML with namespace in Python via 'ElementTree'
问题描述
我有以下要使用Python的 ElementTree
解析的XML:
I have the following XML which I want to parse using Python's ElementTree
:
<rdf:RDF xml:base="http://dbpedia.org/ontology/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://dbpedia.org/ontology/">
<owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
<rdfs:label xml:lang="en">basketball league</rdfs:label>
<rdfs:comment xml:lang="en">
a group of sports teams that compete against each other
in Basketball
</rdfs:comment>
</owl:Class>
</rdf:RDF>
我想查找所有 owl:Class
标签,然后提取其中所有 rdfs:label
实例的值。我正在使用以下代码:
I want to find all owl:Class
tags and then extract the value of all rdfs:label
instances inside them. I am using the following code:
tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')
由于命名空间的原因,出现以下错误。
Because of the namespace, I am getting the following error.
SyntaxError: prefix 'owl' not found in prefix map
中找不到我尝试在http://effbot.org/zone/element-namespaces.htm ,但由于上述XML具有多个嵌套的命名空间,我仍然无法使它正常工作。
I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.
请让我知道如何更改代码以查找所有<$ c $ owl:owl $ 标记。
Kindly let me know how to change the code to find all the owl:Class
tags.
推荐答案
ElementTree对名称空间不太聪明。您需要提供 .find()
, findall()
和 iterfind()
方法使用显式命名空间字典。
ElementTree is not too smart about namespaces. You need to give the .find()
, findall()
and iterfind()
methods an explicit namespace dictionary. This is not documented very well:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
仅在输入的命名空间
参数中查找 前缀。这意味着您可以使用任何您喜欢的名称空间前缀; API会分离 owl:
部分,在 namespaces
字典中查找相应的名称空间URL,然后更改搜索查找XPath表达式 {http://www.w3.org/2002/07/owl} Class
。当然,您也可以自己使用相同的语法:
Prefixes are only looked up in the namespaces
parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl:
part, looks up the corresponding namespace URL in the namespaces
dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class
instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
如果您可以切换到 lxml
库更好。该库支持相同的ElementTree API,但在元素的 .nsmap
属性中为您收集名称空间。
If you can switch to the lxml
library things are better; that library supports the same ElementTree API, but collects namespaces for you in a .nsmap
attribute on elements.
这篇关于通过'ElementTree'在Python中使用名称空间解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!