如何在不移动名称空间的情况下使用Python的ElementTree解析和编写XML? [英] How do I parse and write XML using Python's ElementTree without moving namespaces around?

查看:90
本文介绍了如何在不移动名称空间的情况下使用Python的ElementTree解析和编写XML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的项目来自这种形式的上游XML:

Our project gets from upstream XML of this form:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
  <appSettings>
    <add key="foo" value="default">
    ...
  </appSettings>
</configuration>

然后使用ElementTree读取/解析此XML,然后针对与某个特定键("foo")匹配的每个应用设置,写入一个 it 知道上游进程不知道的新值. 't(在这种情况下,键"foo"的值应为"bar").

It then reads/parses this XML using ElementTree, and then for every app setting matching a certain key ("foo"), it writes a new value that it knows about that the upstream process doesn't ( in this case key "foo" should have the value "bar").

消耗已过滤XML的下游进程是aaahhhh ... 易碎.它希望以确切地上面的形式接收XML.

The downstream process consuming the filtered XML is, aaahhhh... fragile. It expects to receive the XML in exactly the form above.

如果我在不注册名称空间的情况下解析此XML,则ElementTree会在输入中像这样扭曲我的树:

If I parse this XML without registering a namespace, then ElementTree mangles my tree like this on input:

<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
  <runtime>
  <ns0:assemblyBinding>
    <ns0:dependentAssembly>
      <ns0:assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
      <ns0:bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
    </ns0:dependentAssembly>
  </ns0:assemblyBinding>
 </runtime>
 <appSettings>
    <add key="foo" value="default">
    ...
 </appSettings>
</configuration>

下游过程无法处理此问题,因为从语义上讲这是同一件事,还不够聪明.因此,我决定注册我知道上游进程将提供的名称空间作为默认名称空间,以避免前缀出现在各处,现在我明白了:

The downstream process can't handle this, because it's no clever enough to realize that, semantically, this is the same thing. So, I decide to register the namespace I know the upstream process will provide as a default namespace to avoid the prefixes showing up everywhere, and now I get this:

<configuration xmlns="urn:schemas-microsoft-com:asm.v1">
 <runtime>
  <assemblyBinding>
    <dependentAssembly>
      <assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
      <bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
    </dependentAssembly>
  </assemblyBinding>
 </runtime>
 <appSettings>
    <add key="foo" value="default">
    ...
 </appSettings>
</configuration>

我对XML不太了解,但是下游组件也在哭泣,在我看来,这并不意味着此默认xmlns现在适用于 all 包含的元素在<configuration>内,而之前应用于<assemblyBinding>元素?

I don't know much about XML, but this also the downstream component cries about, and it seems to me that doesn't now mean this default xmlns now apply to all included elements inside <configuration>, whereas before it only applied to the <assemblyBinding> element?

使用ElementTree 是否存在,以处理该命名空间,以便我可以接收上游的XML,设置foo的值,然后将其传递给下游,而无需移动命名空间周围,​​并完全按照我发现的方式放置?

Is there anyway, using ElementTree, to handle this namespace so that I can take in the upstream's XML, set foo's value, and then pass that on downstream, without moving the namespace around, and leaving it exactly as I found it?

  • 我可以使用基于lxml的解决方案,该解决方案似乎可以解决这个问题,但 ,lxml依赖于C,下游组件确实不需要支持C:最好使用纯Python解决方案.

  • I could use an lxml-based solution, which seems to handle this, however, lxml has a dependency on C which the downstream component would really like not to have to support: a pure Python solution is preferable.

我可以阅读HTML格式的文档,该文档将忽略命名空间属性,让我操纵所需的值,然后传递该文档; 但是,我还没有找到一个不解析所有元素名称的Python解析器,而我的下游组件要求保留所有元素名称的大小写.

I could read the document as HTML which would ignore the namespace attribute, let me manipulate the value I want, and then pass on the document; however, I have yet to find a Python parser that doesn't downcase all the element names, and my downstream component requires the casing on all element names to be preserved.

我可以求助于字符串解析和正则表达式.我宁愿不写自己的解析器.

I could resort to string parsing and regular expressions. I would rather not write my own parser.

到目前为止,我在ElementTree中可以找到的唯一建议是建议注册默认名称空间以避免前缀"方法,我认为这是合适的方法,但是ElementTree随后坚持将xmlns声明移至转储时的根节点.

The only advice I could find so far about namespace handling in ElementTree suggests the "register a default namespace to avoid prefixes" approach, which I assumed would be suitable, but ElementTree then insists on moving the xmlns declaration up to the root node upon dumping.

我也可以很聪明地建立一个字符串,该字符串以正确的顺序分阶段将树转储出去,以将xmlns声明放回到正确的节点"上,但这也使我震惊易碎.

I could also be clever build up a string that dumps the tree out in stages and in exactly the right order to put the xmlns declaration back on the "right node", but that strikes me, also, as pretty darned fragile.

有人能克服这样的问题吗?

Has anyone managed to get past a problem like this?

推荐答案

据我所知,最适合您需求的解决方案是使用xml.etree.ElementTree公开的功能编写纯Python自定义呈现.这是一种可能的解决方案:

As far as I know the solution that better suits your needs is to write a pure Python custom rendering using the features exposed by xml.etree.ElementTree. Here is one possible solution:

from xml.etree import ElementTree as ET
from re import findall, sub

def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
    buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
    root = root.getroot() if isinstance(root, ET.ElementTree) else root
    _, namespaces = ET._namespaces(root) if not level else (None, namespaces)
    for element in root.iter():
        indent = ' ' * indent_size * level
        tag = sub(r'({[^}]+}\s*)*', '', element.tag)
        buffer += f'{indent}<{tag}'
        for ns in findall(r'{[^}]+}', element.tag):
            ns_key = ns[1:-1]
            if ns_key not in namespaces: continue
            buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
            del namespaces[ns_key]
        for k, v in element.attrib.items():
            buffer += f' {k}="{v}"'
        buffer += '>' + element.text.strip() if element.text else '>'
        children = list(element)
        for child in children:
            sep = '\n' if buffer[-1] != '\n' else ''
            buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
        buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
    return buffer

通过将您提供的XML数据发布到上述render函数,如下所示:

By issuing theXML data you gave, to the above render function as show below:

data=\
'''<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
  <appSettings>
    <add key="foo" value="default" />
  </appSettings>
</configuration>'''

e = ET.fromstring(data)
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
r = ET.ElementTree(e)

您将获得以下结果XML,这些结果具有您要查找的属性:

You'll get the following resulting XML having the properties you stated you are looking for:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral"></assemblyIdentity>
        <bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0"></bindingRedirect>
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
  <appSettings>
    <add key="foo" value="default"></add>
  </appSettings>
</configuration>

我知道我参加晚会很晚.无论如何希望这对您和其他许多有相同问题的人有所帮助,这是一个很好的解决方案.编码愉快!

I know I came late to the party.. Anyway hoping this will help you and many other having the same issue, here it is a good solution. Happy coding!

这篇关于如何在不移动名称空间的情况下使用Python的ElementTree解析和编写XML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆