如何获取命名空间的元素的属性 [英] How to get an attribute of an Element that is namespaced

查看:118
本文介绍了如何获取命名空间的元素的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析每天从供应商处收到的XML文档,它大量使用名称空间.我将问题最小化为一个最小的子集:

我需要解析一些元素,它们都是具有特定属性的元素的子元素.
我可以使用lxml.etree.Element.findall(TAG, root.nsmap)查找我需要检查其属性的候选节点.

然后我尝试通过我知道它使用的名称来检查每个Elements的属性:具体来说,这里是ss:Name.如果该属性的值是所需的值,那么我将深入研究所说的Element(继续做其他事情).

我该怎么做?

我正在解析的XML大致

<FOO xmlns="SOME_REALLY_LONG_STRING"
 some gorp declaring a bunch of namespaces one of which is 
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar" OTHER_ATTRIBS_I_DONT_CARE_ABOUT>
        ....
        <MoreThingsToLookAtLater>
            ....
        </MoreThingsToLookAtLater>
        ....
    </SomethingIWant>
    ...
</FOO>

我找到了我想要的第一个SomethingIWant元素(最终我想要它们,所以我确实找到了所有元素)

import lxml
from lxml import etree

tree = etree.parse(myfilename)
root = tree.getroot()
# i want just the first one for now
my_sheet = root.findall('ss:RecordSet', root.nsmap)[0]

现在,我想从此元素获取ss:Name属性,以进行检查,但是我不确定如何?

我知道my_sheet.attrib将向我显示原始URI,然后显示属性名称,但我不希望这样.我需要检查它是否具有用于特定命名空间属性的特定值. (因为这是错误的,所以我可以完全跳过此元素,以免进一步处理.)

我尝试使用lxml.etree.ElementTree.attrib.get(),但似乎没有获得任何有用的信息.

有什么想法吗?

lxml优于标准python XML解析器的优势之一是lxml通过xpath()方法完全支持XPath 1.0规范.因此,大多数时候我会使用xpath()方法.当前案例的工作示例:

from lxml import etree

xml = """<FOO xmlns="SOME_REALLY_LONG_STRING"
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar">
        ....
    </SomethingIWant>
    ...
</FOO>"""

root = etree.fromstring(xml)
ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT'}

# i want just the first one for now
result = root.xpath('//@ss:Name', namespaces=ns)[0]
print(result)

输出:

bar

更新:

修改后的示例演示如何从当前element的命名空间中获取属性:

ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT', 'd': 'SOME_REALLY_LONG_STRING'}

element = root.xpath('//d:SomethingIWant', namespaces=ns)[0]
print(etree.tostring(element))

attribute = element.xpath('@ss:Name', namespaces=ns)[0]
print(attribute)

输出:

<SomethingIWant xmlns="SOME_REALLY_LONG_STRING" xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT" ss:Name="bar">
        ....
    </SomethingIWant>
    ...

bar

I'm parsing an XML document that I receive from a vendor everyday and it uses namespaces heavily. I've minimized the problem to a minimal subset here:

There are some elements I need to parse, all of which are children of an element with a specific attribute in it.
I am able to use lxml.etree.Element.findall(TAG, root.nsmap) to find the candidate nodes whose attribute I need to check.

I'm then trying to check the attribute of each of these Elements via the name I know it uses : which concretely here is ss:Name. If the value of that attribute is the desired value I'm going to dive deeper into said Element (to continue doing other things).

How can I do this?

The XML I'm parsing is roughly

<FOO xmlns="SOME_REALLY_LONG_STRING"
 some gorp declaring a bunch of namespaces one of which is 
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar" OTHER_ATTRIBS_I_DONT_CARE_ABOUT>
        ....
        <MoreThingsToLookAtLater>
            ....
        </MoreThingsToLookAtLater>
        ....
    </SomethingIWant>
    ...
</FOO>

I found the first Element I wanted SomethingIWant like so (ultimately I want them all so I did find all)

import lxml
from lxml import etree

tree = etree.parse(myfilename)
root = tree.getroot()
# i want just the first one for now
my_sheet = root.findall('ss:RecordSet', root.nsmap)[0]

Now I want to get the ss:Name attribute from this element, to check it, but I'm not sure how?

I know that my_sheet.attrib will display me the raw URI followed by the attribute name, but I don't want that. I need to check if it has a specific value for a specific namespaced attribute. (Because if it's wrong I can skip this element from further processing entirely).

I tried using lxml.etree.ElementTree.attrib.get() but I don't seem to obtain anything useful.

Any ideas?

解决方案

One of advantages of lxml over standard python XML parser is lxml's full-support of XPath 1.0 specfication via xpath() method. So I would go with xpath() method most of the time. Working example for your current case :

from lxml import etree

xml = """<FOO xmlns="SOME_REALLY_LONG_STRING"
 xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT"
>
    <child_of_foo>
        ....
    </child_of_foo>
    ...
    <SomethingIWant ss:Name="bar">
        ....
    </SomethingIWant>
    ...
</FOO>"""

root = etree.fromstring(xml)
ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT'}

# i want just the first one for now
result = root.xpath('//@ss:Name', namespaces=ns)[0]
print(result)

output :

bar

UPDATE :

Modified example demonstrating how to get attribute in namespace from current element :

ns = {'ss': 'THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT', 'd': 'SOME_REALLY_LONG_STRING'}

element = root.xpath('//d:SomethingIWant', namespaces=ns)[0]
print(etree.tostring(element))

attribute = element.xpath('@ss:Name', namespaces=ns)[0]
print(attribute)

output :

<SomethingIWant xmlns="SOME_REALLY_LONG_STRING" xmlns:ss="THE_VERY_SAME_REALLY_LONG_STRING_AS_ROOT" ss:Name="bar">
        ....
    </SomethingIWant>
    ...

bar

这篇关于如何获取命名空间的元素的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆