使用lxml获取复杂元素的属性 [英] Get attribute of complex element using lxml

查看:99
本文介绍了使用lxml获取复杂元素的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的XML文件,如下所示:

I have a simple file XML like below:

    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" />BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
      </maxspeed>

我想使用lxml解析它并获取它的值: 使用brandName,它只需要:

I want to parse it using lxml and get the value of it: With brandName, it just need:

    'brand_name'  : m.findtext(NS+'brandName')

如果我想了解它的缩写属性

If I want to get into abbrev attribute of it.

    'brand_name'  : m.findtext(NS+'brandName').attrib['abbrev']

使用maxspeed,我可以通过以下方式获得maxspeed的值:

With maxspeed, i can get the value of maxspeed by:

    'maxspeed_value'                  : m.findtext(NS+'maxspeed/value'),

或:

    'maxspeed_value'                  : m.find(NS+'maxspeed/value').text,

现在,我想获取unit的属性,我尝试了很多不同的方法,但是失败了.大多数情况下,错误是:

Now, I want to get the attribute of unit inside , I have tried a lot of different way but I'm failed. The error most of time is:

    'NoneType' object has no attribute 'attrib'

以下是我尝试过的几种方法,但都失败了:

Here are several ways I tried and it failed:

    'maxspeed_unit'                  : m.find(NS+'maxspeed/value').attrib['abbrev'],
    'maxspeed_unit'                  : (m.find(NS+'maxspeed/value'))get('abbrev'),

您能否给我一些提示,为什么它不起作用? 非常感谢你!

Could you please give me some hint why it doesn't work? Thank you very much!

更新XML:

    <Car xmlns="http://example.com/vocab/xml/cars#">
     <dateStarted>2011-02-05</dateStarted>
     <dateSold>2011-02-13</dateSold>
    <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" />BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
      </maxspeed>
      <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
      <power>
        <value>180</value>
        <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
      </power>
      <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>

推荐答案

lxml元素上的.find方法将仅搜索该元素的直接子项.因此例如在此xml中:

The .find method on an lxml Element will only search the direct sub-children of that element. so for example in this xml:

<root>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW">BMW</brandName>
    <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
    </maxspeed>
</root>

您可以使用root Elements .find方法定位brandname元素或maxspeed元素,但搜索不会在这些内部元素内遍历.

You can use the root Elements .find method to locate the brandname element, or the maxspeed element, but the search will not traverse inside these inner elements.

例如,您可以执行以下操作:

So you could for example do something like this:

root.find('maxspeed').find('unit') #returns the unit Element

从此返回的元素中,您可以访问属性.

From this returned element you can access the attributes.

如果要搜索XML文档中的所有元素,可以使用.iter()方法.因此,对于前面的示例,您可以说:

If you'd like to search through all the elements within an XML doc, you can use the .iter() method. So for the previous example you could say:

for element in root.iter(tag='unit'):
    print element #This would print all the unit elements in the document.

编辑:这是一个使用您提供的xml的功能齐全的小型示例:

Here is a small fully functional example using the xml you provided:

import lxml.etree
from StringIO import StringIO

def ns_join(element, tag, namespace=None):
    '''Joins the namespace and tag together, and
    returns the fully qualified name.
    @param element - The lxml.etree._Element you're searching
    @param tag - The tag you're joining
    @param namespace - (optional) The Namespace shortname default is None'''

    return '{%s}%s' % (element.nsmap[namespace], tag)

def parse_car(element):
    '''Parse a car element, This will return a dictionary containing
    brand_name, maxspeed_value, and maxspeed_unit'''

    maxspeed = element.find(ns_join(element,'maxspeed'))
    return { 
        'brand_name' : element.findtext(ns_join(element,'brandName')), 
        'maxspeed_value' : maxspeed.findtext(ns_join(maxspeed,'value')), 
        'maxspeed_unit' : maxspeed.find(ns_join(maxspeed, 'unit')).attrib['abbrev']
        }

#Create the StringIO object to feed to the parser.
XML = StringIO('''
<Reports>
    <Car xmlns="http://example.com/vocab/xml/cars#">
        <dateStarted>2011-02-05</dateStarted>
        <dateSold>2011-02-13</dateSold>
        <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
        <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
        <maxspeed>
            <value>250</value>
            <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
        </maxspeed>
        <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
        <power>
            <value>180</value>
            <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
        </power>
        <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>
</Reports>
''')

#Get the root element object of the xml
car_root_element = lxml.etree.parse(XML).getroot()

# For each 'Car' tag in the root element,
# we want to parse it and save the list as cars
cars = [ parse_car(element) 
    for element in car_root_element.iter() if element.tag.endswith('Car')]

print cars

希望有帮助.

这篇关于使用lxml获取复杂元素的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆