使用Python解析XML-访问元素 [英] Parsing XML with Python - accessing elements

查看:84
本文介绍了使用Python解析XML-访问元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用lxml解析一些xml,但是由于某些原因,我找不到特定的元素.

I'm using lxml to parse some xml, but for some reason I can't find a specific element.

我正在尝试访问<Constant>元素.

这是一个xml代码段:

Here's an xml snippet:

  </rdf:Description>
</rdf:RDF>
        </MiriamAnnotation>
        <ListOfSubstrates>
          <Substrate metabolite="Metabolite_5" stoichiometry="1"/>
        </ListOfSubstrates>
        <ListOfModifiers>
          <Modifier metabolite="Metabolite_9" stoichiometry="1"/>
        </ListOfModifiers>
        <ListOfConstants>
          <Constant key="Parameter_4344" name="Kcat" value="433.724"/>
          <Constant key="Parameter_4343" name="km" value="479.617"/>

我正在使用的代码是这样的:

The code I'm using is like this:

    >>> from lxml import etree as ET
    >>> parsed = ET.parse('ct.cps')
    >>> root = parsed.getroot()    
    >>> for a in root.findall(".//Constant"):
    ...     print a.attrib['key']
    ... 
    >>> for a in root.findall('Constant'):
    ...     print a.get('key')
    ... 
    >>> for a in root.findall('Constant'):
    ...     print a.attrib['key']
    ... 

如您所见,这些东西似乎都不起作用.

As you can see, none of these things seem to work.

我在做什么错了?

我想知道是否与<Constant>元素为空的事实有关?

I'm wondering if it has something to do with the fact that <Constant> elements are empty?

此处为xml来源: https://www .dropbox.com/s/i6hga7nvmcd6rxx/ct.cps?dl = 0

Source xml here: https://www.dropbox.com/s/i6hga7nvmcd6rxx/ct.cps?dl=0

推荐答案

以下是获取所需值的方法:

Here is how you can get the values you are looking for:

from lxml import etree

parsed = etree.parse('ct.cps')

for a in parsed.findall("//{http://www.copasi.org/static/schema}Constant"):
    print a.attrib["key"]

输出:

Parameter_4344
Parameter_4343
Parameter_4342
Parameter_4341
Parameter_4340
Parameter_4339
Parameter_4338
Parameter_4337
Parameter_4336
Parameter_4335
Parameter_4334
Parameter_4333
Parameter_4332
Parameter_4331
Parameter_4330
Parameter_4329
Parameter_4328
Parameter_4327
Parameter_4326
Parameter_4325
Parameter_4324
Parameter_4323
Parameter_4322
Parameter_4321
Parameter_4320
Parameter_4319

这里重要的是,XML文件中的COPASI根元素(Dropbox URL上的实际元素)声明了默认名称空间(http://www.copasi.org/static/schema).这意味着该元素及其所有后代(包括Constant)都属于该命名空间.

The important thing here is that the COPASI root element in your XML file (the real one at the Dropbox URL) declares a default namespace (http://www.copasi.org/static/schema). This means that the element and all its descendants, including Constant, belong to that namespace.

因此,您需要查找{http://www.copasi.org/static/schema}Constant元素,而不是Constant元素.

So instead of Constant elements, you need to look for {http://www.copasi.org/static/schema}Constant elements.

请参见 http://lxml.de/tutorial.html#namespaces .

以下是使用XPath代替findall的方法:

Here is how you could do it using XPath instead of findall:

from lxml import etree

NSMAP = {"c": "http://www.copasi.org/static/schema"}

parsed = etree.parse('ct.cps')

for a in parsed.xpath("//c:Constant", namespaces=NSMAP):
    print a.attrib["key"]

请参见 http://lxml.de/xpathxslt.html#namespaces-and-prefixes .

这篇关于使用Python解析XML-访问元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆