使用Python解析XML-访问元素 [英] Parsing XML with Python - accessing elements
问题描述
我正在使用lxml解析一些xml,但是由于某些原因,我找不到特定的元素.
I'm using lxml to parse some xml, but for some reason I can't find a specific element.
我正在尝试访问<Constant>
元素.
这是一个xml代码段:
Here's an xml snippet:
</rdf:Description>
</rdf:RDF>
</MiriamAnnotation>
<ListOfSubstrates>
<Substrate metabolite="Metabolite_5" stoichiometry="1"/>
</ListOfSubstrates>
<ListOfModifiers>
<Modifier metabolite="Metabolite_9" stoichiometry="1"/>
</ListOfModifiers>
<ListOfConstants>
<Constant key="Parameter_4344" name="Kcat" value="433.724"/>
<Constant key="Parameter_4343" name="km" value="479.617"/>
我正在使用的代码是这样的:
The code I'm using is like this:
>>> from lxml import etree as ET
>>> parsed = ET.parse('ct.cps')
>>> root = parsed.getroot()
>>> for a in root.findall(".//Constant"):
... print a.attrib['key']
...
>>> for a in root.findall('Constant'):
... print a.get('key')
...
>>> for a in root.findall('Constant'):
... print a.attrib['key']
...
如您所见,这些东西似乎都不起作用.
As you can see, none of these things seem to work.
我在做什么错了?
我想知道是否与<Constant>
元素为空的事实有关?
I'm wondering if it has something to do with the fact that <Constant>
elements are empty?
此处为xml来源: https://www .dropbox.com/s/i6hga7nvmcd6rxx/ct.cps?dl = 0
Source xml here: https://www.dropbox.com/s/i6hga7nvmcd6rxx/ct.cps?dl=0
推荐答案
以下是获取所需值的方法:
Here is how you can get the values you are looking for:
from lxml import etree
parsed = etree.parse('ct.cps')
for a in parsed.findall("//{http://www.copasi.org/static/schema}Constant"):
print a.attrib["key"]
输出:
Parameter_4344
Parameter_4343
Parameter_4342
Parameter_4341
Parameter_4340
Parameter_4339
Parameter_4338
Parameter_4337
Parameter_4336
Parameter_4335
Parameter_4334
Parameter_4333
Parameter_4332
Parameter_4331
Parameter_4330
Parameter_4329
Parameter_4328
Parameter_4327
Parameter_4326
Parameter_4325
Parameter_4324
Parameter_4323
Parameter_4322
Parameter_4321
Parameter_4320
Parameter_4319
这里重要的是,XML文件中的COPASI
根元素(Dropbox URL上的实际元素)声明了默认名称空间(http://www.copasi.org/static/schema
).这意味着该元素及其所有后代(包括Constant
)都属于该命名空间.
The important thing here is that the COPASI
root element in your XML file (the real one at the Dropbox URL) declares a default namespace (http://www.copasi.org/static/schema
). This means that the element and all its descendants, including Constant
, belong to that namespace.
因此,您需要查找{http://www.copasi.org/static/schema}Constant
元素,而不是Constant
元素.
So instead of Constant
elements, you need to look for {http://www.copasi.org/static/schema}Constant
elements.
请参见 http://lxml.de/tutorial.html#namespaces .
以下是使用XPath代替findall
的方法:
Here is how you could do it using XPath instead of findall
:
from lxml import etree
NSMAP = {"c": "http://www.copasi.org/static/schema"}
parsed = etree.parse('ct.cps')
for a in parsed.xpath("//c:Constant", namespaces=NSMAP):
print a.attrib["key"]
请参见 http://lxml.de/xpathxslt.html#namespaces-and-prefixes .
这篇关于使用Python解析XML-访问元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!