使用命名空间获取 lxml 标记属性 [英] Getting lxml tag attributes with namespaces

查看:46
本文介绍了使用命名空间获取 lxml 标记属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 XML 看起来像:

<预><代码>...<termEntry id="c1"><langSet xml:lang="de">...

我有代码:

from lxml import etree...对于 root.iterfind('.//termEntry') 中的 term_entry:打印 term_entry.attrib['id']打印 term_entry.nsmap对于 term_entry.iterfind('langSet') 中的 lang_set:打印 lang_set.nsmap打印 lang_set.attrib对于 lang_set.iterfind('some_stuff') 中的 some_stuff:...

我得到空的 nsmap 字典,我的属性字典看起来像 {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}

该文件可能不包含命名空间中的 xml:,或者它可能具有不同的命名空间.我怎么知道标签声明中使用了什么命名空间?事实上,我只需要获得一个 lang 属性,我不在乎使用了什么命名空间.我不想使用任何像 lang_set.attrib.values()[0] 或其他已知名称字段查找的蹩脚垃圾.

解决方案

我只需要获得一个 lang 属性,我不在乎使用了什么命名空间

您的问题不是很清楚,您还没有提供任何完整的可运行代码示例.但是按照@mmgp 在评论中的建议进行一些字符串操作可能就足够了.

然而,xml:langrandom_prefix:lang(或只是 lang)不同.我认为您应该关心命名空间.如果目标是确定应用于元素内容的自然语言,那么您应该使用 xml:lang(因为这是该属性的明确目的;请参阅 http://www.w3.org/TR/REC-xml/#sec-lang-tag).

<小时><块引用>

我只想知道属性的 {http://www.w3.org/XML/1998/namespace} 字符串存储在哪里.

重要的是要知道 xml 前缀是特殊的.它是保留的(与几乎所有其他应该是任意的命名空间前缀相反)并定义为绑定到 http://www.w3.org/XML/1998/namespace.>

来自 XML 1.0 W3C 建议中的命名空间:

<块引用>

前缀 xml 根据定义绑定到名称空间名称 http://www.w3.org/XML/1998/namespace.它可以,但不需要,被声明,并且不能被绑定到任何其他命名空间名称.其他前缀不得绑定到此名称空间名称,并且不得将其声明为默认名称空间.

xml 前缀的其他用途是 xml:spacexml:base 属性.

<小时><块引用>

真的很奇怪,如果lxml没有提供任何命名空间处理的方法

lxml 可以很好地处理命名空间,但是 尽可能避免使用前缀.在进行涉及 xml 前缀的查找时,您需要使用 http://www.w3.org/XML/1998/namespace 命名空间名称.

My XML looks like:

...
<termEntry id="c1">
    <langSet xml:lang="de">
    ...

And i have the code:

from lxml import etree
...

for term_entry in root.iterfind('.//termEntry'):
    print term_entry.attrib['id']
    print term_entry.nsmap

    for lang_set in term_entry.iterfind('langSet'):
        print lang_set.nsmap
        print lang_set.attrib

        for some_stuff in lang_set.iterfind('some_stuff'):
            ...

I get the empty nsmap dict, and my attrib dict looks like {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}

The file may not contain xml: in namespace, or it may have a different namespace. How can i know what namespace used in the tag declaration? In fact, i just need to get a lang attribute, i don't care what namespace was used. I don't want use any crappy trash like lang_set.attrib.values()[0] or other lookups of a field with the known name.

解决方案

i just need to get a lang attribute, i don't care what namespace was used

Your question is not very clear and you haven't provided any complete runnable code example. But doing some string manipulation as suggested by @mmgp in a comment may be enough.

However, xml:lang is not the same as random_prefix:lang (or just lang). I think you should care about the namespace. If the objective is to identify the natural language that applies to an element's content, then you should be using xml:lang (because that is the explicit purpose of this attribute; see http://www.w3.org/TR/REC-xml/#sec-lang-tag).


I just want to know where is stored the {http://www.w3.org/XML/1998/namespace} string for attributes.

It is important to know that the xml prefix is special. It is reserved (as opposed to almost all other namespace prefixes which are supposed to be arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace.

From the Namespaces in XML 1.0 W3C recommendation:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Other uses of the xml prefix are the xml:space and xml:base attributes.


It is really strange, if lxml does not provide any method for namespace processing

lxml processes namespaces just fine, but prefixes are avoided as much as possible. You will need to use the http://www.w3.org/XML/1998/namespace namespace name when doing lookups that involve the xml prefix.

这篇关于使用命名空间获取 lxml 标记属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆