Python:使用 ElementTree 更新 XML 文件,同时尽可能地保留布局 [英] Python: Update XML-file using ElementTree while conserving layout as much as possible

查看:39
本文介绍了Python:使用 ElementTree 更新 XML 文件,同时尽可能地保留布局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用 XML 命名空间的文档,我想将其 /group/house/dogs 增加一:(文件名为 houses.xml)

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)

<?xml version="1.0"?>
<group xmlns="http://dogs.house.local">
    <house>
            <id>2821</id>
            <dogs>2</dogs>
    </house>
</group>

我当前使用以下代码的结果是:(创建的文件名为houses2.xml)

My current result using the code below is: (the created file is called houses2.xml)

<ns0:group xmlns:ns0="http://dogs.house.local">
    <ns0:house>
        <ns0:id>2821</ns0:id>
        <ns0:dogs>3</ns0:dogs>
    </ns0:house>
</ns0:group>

我想解决两件事(如果可以使用 ElementTree.如果不是,我很乐意提供关于我应该使用什么的建议):

I would like to fix two things (if it is possible using ElementTree. If it isn´t, I´d be greatful for a suggestion as to what I should use instead):

  1. 我想保留 行.
  2. 我不想为所有标签添加前缀,我想保持原样.

总而言之,我不想过多地弄乱文档.

我当前的代码(除了上述缺陷外都有效)生成上述结果如下.

My current code (which works except for the above mentioned flaws) generating the above result follows.

我制作了一个实用函数,它使用 ElementTree 加载一个 XML 文件并返回 elementTree 和命名空间(因为我不想对命名空间进行硬编码,并且愿意承担它所暗示的风险):

I have made a utility function which loads an XML file using ElementTree and returns the elementTree and the namespace (as I do not want to hard code the namespace, and am willing to take the risk it implies):

def elementTreeRootAndNamespace(xml_file):
    from xml.etree import ElementTree
    import re
    element_tree = ElementTree.parse(xml_file)

    # Search for a namespace on the root tag
    namespace_search = re.search('^({\S+})', element_tree.getroot().tag)
    # Keep the namespace empty if none exists, if a namespace exists set
    # namespace to {namespacename}
    namespace = ''
    if namespace_search:
        namespace = namespace_search.group(1)

    return element_tree, namespace

这是我更新狗的数量并将其保存到新文件houses2.xml的代码:

This is my code to update the number of dogs and save it to the new file houses2.xml:

elementTree, namespace = elementTreeRootAndNamespace('houses.xml')

# Insert the namespace before each tag when when finding current number of dogs,
# as ElementTree requires the namespace to be prefixed within {...} when a
# namespace is used in the document.
dogs = elementTree.find('{ns}house/{ns}dogs'.format(ns = namespace))

# Increase the number of dogs by one
dogs.text = str(int(dogs.text) + 1)

# Write the result to the new file houses2.xml.
elementTree.write('houses2.xml')

推荐答案

不幸的是,往返并不是一个小问题.对于 XML,除非您使用特殊的解析器(例如 DecentXML 但那是针对 Java 的).

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).

根据您的需要,您有以下选择:

Depending on your needs, you have the following options:

  • 如果您控制源代码并且可以通过单元测试保护您的代码,您就可以编写自己的简单解析器.此解析器不接受 XML,而只接受有限的子集.例如,您可以将整个文档作为字符串读取,然后使用 Python 的字符串操作来定位 并替换下一个 < 之前的任何内容.黑客?是的.

  • If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.

您可以过滤输出.XML 只允许字符串 <ns0: 出现在一处,因此您可以使用 < 搜索并替换它,然后使用 <group xmlns 搜索和替换它:ns0="CDATA.

You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0="<group xmlns=". This is pretty safe unless you can have CDATA in your XML.

您可以编写自己的简单 XML 解析器.将输入作为字符串读取,然后为每对 <> 加上它们在输入中的位置创建元素.这使您可以快速拆分输入,但仅适用于小输入.

You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.

这篇关于Python:使用 ElementTree 更新 XML 文件,同时尽可能地保留布局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆