使用命名空间获取 lxml 标记属性 [英] Getting lxml tag attributes with namespaces
问题描述
我的 XML 看起来像:
<预><代码>...<termEntry id="c1"><langSet xml:lang="de">...我有代码:
from lxml import etree...对于 root.iterfind('.//termEntry') 中的 term_entry:打印 term_entry.attrib['id']打印 term_entry.nsmap对于 term_entry.iterfind('langSet') 中的 lang_set:打印 lang_set.nsmap打印 lang_set.attrib对于 lang_set.iterfind('some_stuff') 中的 some_stuff:...
我得到空的 nsmap 字典,我的属性字典看起来像 {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}
该文件可能不包含命名空间中的 xml:
,或者它可能具有不同的命名空间.我怎么知道标签声明中使用了什么命名空间?事实上,我只需要获得一个 lang
属性,我不在乎使用了什么命名空间.我不想使用任何像 lang_set.attrib.values()[0]
或其他已知名称字段查找的蹩脚垃圾.
我只需要获得一个
lang
属性,我不在乎使用了什么命名空间
您的问题不是很清楚,您还没有提供任何完整的可运行代码示例.但是按照@mmgp 在评论中的建议进行一些字符串操作可能就足够了.
然而,xml:lang
与 random_prefix:lang
(或只是 lang
)不同.我认为您应该关心命名空间.如果目标是确定应用于元素内容的自然语言,那么您应该使用 xml:lang
(因为这是该属性的明确目的;请参阅 http://www.w3.org/TR/REC-xml/#sec-lang-tag).
我只想知道属性的 {http://www.w3.org/XML/1998/namespace}
字符串存储在哪里.
重要的是要知道 xml
前缀是特殊的.它是保留的(与几乎所有其他应该是任意的命名空间前缀相反)并定义为绑定到 http://www.w3.org/XML/1998/namespace
.>
前缀 xml 根据定义绑定到名称空间名称 http://www.w3.org/XML/1998/namespace
.它可以,但不需要,被声明,并且不能被绑定到任何其他命名空间名称.其他前缀不得绑定到此名称空间名称,并且不得将其声明为默认名称空间.
xml
前缀的其他用途是 xml:space
和 xml:base
属性.
真的很奇怪,如果lxml没有提供任何命名空间处理的方法
lxml 可以很好地处理命名空间,但是 尽可能避免使用前缀.在进行涉及 xml
前缀的查找时,您需要使用 http://www.w3.org/XML/1998/namespace
命名空间名称.
My XML looks like:
...
<termEntry id="c1">
<langSet xml:lang="de">
...
And i have the code:
from lxml import etree
...
for term_entry in root.iterfind('.//termEntry'):
print term_entry.attrib['id']
print term_entry.nsmap
for lang_set in term_entry.iterfind('langSet'):
print lang_set.nsmap
print lang_set.attrib
for some_stuff in lang_set.iterfind('some_stuff'):
...
I get the empty nsmap dict, and my attrib dict looks like {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}
The file may not contain xml:
in namespace, or it may have a different namespace. How can i know what namespace used in the tag declaration? In fact, i just need to get a lang
attribute, i don't care what namespace was used. I don't want use any crappy trash like lang_set.attrib.values()[0]
or other lookups of a field with the known name.
i just need to get a
lang
attribute, i don't care what namespace was used
Your question is not very clear and you haven't provided any complete runnable code example. But doing some string manipulation as suggested by @mmgp in a comment may be enough.
However, xml:lang
is not the same as random_prefix:lang
(or just lang
). I think you should care about the namespace. If the objective is to identify the natural language that applies to an element's content, then you should be using xml:lang
(because that is the explicit purpose of this attribute; see http://www.w3.org/TR/REC-xml/#sec-lang-tag).
I just want to know where is stored the
{http://www.w3.org/XML/1998/namespace}
string for attributes.
It is important to know that the xml
prefix is special. It is reserved (as opposed to almost all other namespace prefixes which are supposed to be arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace
.
From the Namespaces in XML 1.0 W3C recommendation:
The prefix xml is by definition bound to the namespace name
http://www.w3.org/XML/1998/namespace
. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.
Other uses of the xml
prefix are the xml:space
and xml:base
attributes.
It is really strange, if lxml does not provide any method for namespace processing
lxml processes namespaces just fine, but prefixes are avoided as much as possible. You will need to use the http://www.w3.org/XML/1998/namespace
namespace name when doing lookups that involve the xml
prefix.
这篇关于使用命名空间获取 lxml 标记属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!