使用lxml修改给定xml文档中的名称空间 [英] Modify namespaces in a given xml document with lxml

查看:87
本文介绍了使用lxml修改给定xml文档中的名称空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的xml文档:

I have an xml-document that looks like this:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="http://someurl/Oldschema"
     xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
     xmlns:framework="http://someurl/Oldframework">
   <framework:tag1> ... </framework:tag1>
   <framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>

我要做的就是将http://someurl/Oldschema更改为http://someurl/Newschema,将http://someurl/Oldframework更改为http://someurl/Newframework,并使其余文档保持不变.从该线程的一些见解中 lxml:将命名空间添加到输入文件,我尝试了以下:

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged. With some insights from this thread lxml: add namespace to input file, I tried the following:

def fix_nsmap(nsmap, tag):
    """update the old nsmap-dict with the new schema-urls. Example:
    fix_nsmap({"framework": "http://someurl/Oldframework",
               None: "http://someurl/Oldschema"}) ==
      {"framework": "http://someurl/Newframework",
       None: "http://someurl/Newschema"}"""
    ...

from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
    xml_declaration=True) 

这会在根标签中产生正确的属性",但对于文档的其余部分完全失败:

This produces the right 'attributes' in the root-tag but completely fails for the rest of the document:

<network xmlns:framework="http://someurl/Newframework"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://someurl/Newschema"
    xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
          xmlns:ns2="http://someurl/Oldschema">
    <ns2:tagA> ... </ns2:tagA>
</ns1:tag2>

我的方法有什么问题?还有其他更改命名空间的方法吗?也许我可以使用xslt?

What is wrong with my approach? Is there any other way to change the namespaces? Maybe I could use xslt?

谢谢!

丹尼斯

推荐答案

我要做的就是将http://someurl/Oldschema更改为http://someurl/Newschema,将http://someurl/Oldframework更改为http://someurl/Newframework,并使其余文档保持不变.

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged.

我将执行一个简单的文本搜索和替换操作.这比摆弄XML节点要容易得多.像这样:

I'd do a simple textual search-and-replace operation. It's much easier than fiddling with XML nodes. Like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
    data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
    outfile.write(data)


您受到启发的其他问题是关于添加一个新的命名空间(并保留旧名称空间).但是您正在尝试修改现有的名称空间声明.在这种情况下,无法创建新的根元素并复制子节点.


The other question that you were inspired by is about adding a new namespace (and keeping the old ones). But you are trying to modify existing namespace declarations. Creating a new root element and copying the child nodes does not work in this case.

此行:

new_root[:] = root[:]

将原始根元素的子代转换为新根元素的子代.但是这些子节点仍与旧的名称空间相关联.因此,它们也必须进行修改/重新创建.我想可能有一个合理的方法可以做到这一点,但我认为您不需要它.文字搜索和替换就足够了,恕我直言.

turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. So they have to be modified/recreated too. I guess it might be possible to come up with a reasonable way to do that, but I don't think you need it. Textual search-and-replace is good enough, IMHO.

这篇关于使用lxml修改给定xml文档中的名称空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆