解析前在lxml中注册名称空间 [英] Registering namespaces with lxml before parsing
问题描述
我正在使用lxml从具有名称空间的外部服务中解析XML,但未在 xmlns
中注册它们.我正在尝试使用 register_namespace
手动注册它,但这似乎不起作用.
I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns
. I am trying to register it by hand with register_namespace
, but that doesn't seem to work.
from lxml import etree
xml = """
<Foo xsi:type="xsd:string">bar</Foo>
"""
etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined
我想念什么?奇怪的是,查看lxml源代码以尝试了解我可能做错了什么,看来 xsi
命名空间应该已经作为默认值之一存在了命名空间.
What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi
namespace should already be there as one of the default namespaces.
推荐答案
解析 XML 文档然后再次保存时,lxml 不会更改任何前缀(并且 register_namespace
无效).
When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace
has no effect).
如果您的XML文档未声明其命名空间前缀,则它不是命名空间格式正确的.解析前使用 register_namespace
不能解决此问题.
If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace
before parsing cannot fix this.
register_namespace
定义了序列化新创建的XML文档时要使用的前缀.
register_namespace
defines the prefixes to be used when serializing a newly created XML document.
from lxml import etree
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
输出:
<ns0:Foo xmlns:ns0="http://example.com"/>
示例2(具有 register_namespace
):
从lxml导入etree的
Example 2 (with register_namespace
):
from lxml import etree
etree.register_namespace("abc", "http://example.com")
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
输出:
<abc:Foo xmlns:abc="http://example.com"/>
示例3(没有 register_namespace
,但具有与常规前缀相关联的知名"命名空间):
从lxml导入etree的
Example 3 (without register_namespace
, but with a "well-known" namespace associated with a conventional prefix):
from lxml import etree
el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
print(etree.tostring(el).decode())
输出:
<xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
这篇关于解析前在lxml中注册名称空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!