通过'ElementTree'在Python中使用名称空间解析XML [英] Parsing XML with namespace in Python via 'ElementTree'

查看:134
本文介绍了通过'ElementTree'在Python中使用名称空间解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下要使用Python的 ElementTree 解析的XML:

I have the following XML which I want to parse using Python's ElementTree:

<rdf:RDF xml:base="http://dbpedia.org/ontology/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns="http://dbpedia.org/ontology/">

    <owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
        <rdfs:label xml:lang="en">basketball league</rdfs:label>
        <rdfs:comment xml:lang="en">
          a group of sports teams that compete against each other
          in Basketball
        </rdfs:comment>
    </owl:Class>

</rdf:RDF>

我想查找所有 owl:Class 标签,然后提取其中所有 rdfs:label 实例的值。我正在使用以下代码:

I want to find all owl:Class tags and then extract the value of all rdfs:label instances inside them. I am using the following code:

tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')

由于命名空间的原因,出现以下错误。

Because of the namespace, I am getting the following error.

SyntaxError: prefix 'owl' not found in prefix map

中找不到我尝试在http://effbot.org/zone/element-namespaces.htm ,但由于上述XML具有多个嵌套的命名空间,我仍然无法使它正常工作。

I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.

请让我知道如何更改代码以查找所有<$ c $ owl:owl $ 标记。

Kindly let me know how to change the code to find all the owl:Class tags.

推荐答案

ElementTree对名称空间不太聪明。您需要提供 .find() findall() iterfind()方法使用显式命名空间字典。

ElementTree is not too smart about namespaces. You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary. This is not documented very well:

namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed

root.findall('owl:Class', namespaces)

仅在输入的命名空间参数中查找 前缀。这意味着您可以使用任何您喜欢的名称空间前缀; API会分离 owl:部分,在 namespaces 字典中查找相应的名称空间URL,然后更改搜索查找XPath表达式 {http://www.w3.org/2002/07/owl} Class 。当然,您也可以自己使用相同的语法:

Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:

root.findall('{http://www.w3.org/2002/07/owl#}Class')

如果您可以切换到 lxml 更好。该库支持相同的ElementTree API,但在元素的 .nsmap 属性中为您收集名称空间。

If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in a .nsmap attribute on elements.

这篇关于通过'ElementTree'在Python中使用名称空间解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆