使用Python在YAML中获取重复密钥 [英] Getting duplicate keys in YAML using Python

查看:212
本文介绍了使用Python在YAML中获取重复密钥的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要解析包含重复密钥的YAML文件,而所有这些都需要解析.跳过重复项是不够的.我知道这违反了YAML规范,我不想这样做,但是我们使用的第三方工具可以启用这种用法,我们需要对其进行处理.

We are in need of parsing YAML files which contain duplicate keys and all of these need to be parsed. It is not enough to skip duplicates. I know this is against the YAML spec and I would like to not have to do it, but a third-party tool used by us enables this usage and we need to deal with it.

文件示例:

build:
  step: 'step1'

build:
  step: 'step2'

解析后,我们应该具有与此类似的数据结构:

After parsing we should have a similar data structure to this:

yaml.load('file.yml')
# [('build', [('step', 'step1')]), ('build', [('step', 'step2')])]

dict不能再用于表示已解析的内容.

dict can no longer be used to represent the parsed contents.

我正在寻找Python中的解决方案,但没有找到支持该解决方案的库,我错过了什么吗?

I am looking for a solution in Python and I didn't find a library supporting this, have I missed anything?

或者,我很乐于写自己的东西,但希望使其尽可能简单. ruamel.yaml看起来像Python中最高级的YAML解析器,并且看起来可以适度扩展,可以扩展以支持重复字段吗?

Alternatively, I am happy to write my own thing but would like to make it as simple as possible. ruamel.yaml looks like the most advanced YAML parser in Python and it looks moderately extensible, can it be extended to support duplicate fields?

推荐答案

PyYAML只会默默覆盖第一个条目 ruamel.yaml ¹将给出DuplicateKeyFutureWarning,并在新API上产生DuplicateKeyError.

PyYAML will just silently overwrite the first entry, ruamel.yaml¹ will give a DuplicateKeyFutureWarning if used with the legacy API, and raise a DuplicateKeyError with the new API.

如果您不想为所有类型创建完整的Constructor,则可以覆盖SafeConstructor中的映射构造函数来完成此工作:

If you don't want to create a full Constructor for all types, overwriting the mapping constructor in SafeConstructor should do the job:

import sys
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor

yaml_str = """\
build:
  step: 'step1'

build:
  step: 'step2'
"""


def construct_yaml_map(self, node):
    # test if there are duplicate node keys
    data = []
    yield data
    for key_node, value_node in node.value:
        key = self.construct_object(key_node, deep=True)
        val = self.construct_object(value_node, deep=True)
        data.append((key, val))


SafeConstructor.add_constructor(u'tag:yaml.org,2002:map', construct_yaml_map)
yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
print(data)

给出:

[('build', [('step', 'step1')]), ('build', [('step', 'step2')])]

但是,似乎没有必要将step: 'step1'放入列表中.以下仅在有重复项的情况下创建列表(如果需要,可以通过缓存self.construct_object(key_node, deep=True)的结果进行优化):

However it doesn't seem necessary to make step: 'step1' into a list. The following will only create the list if there are duplicate items (could be optimised if necessary, by caching the result of the self.construct_object(key_node, deep=True)):

def construct_yaml_map(self, node):
    # test if there are duplicate node keys
    keys = set()
    for key_node, value_node in node.value:
        key = self.construct_object(key_node, deep=True)
        if key in keys:
            break
        keys.add(key)
    else:
        data = {}  # type: Dict[Any, Any]
        yield data
        value = self.construct_mapping(node)
        data.update(value)
        return
    data = []
    yield data
    for key_node, value_node in node.value:
        key = self.construct_object(key_node, deep=True)
        val = self.construct_object(value_node, deep=True)
        data.append((key, val))

给出:

[('build', {'step': 'step1'}), ('build', {'step': 'step2'})]

一些要点:

  • 也许不用多说,这对 YAML合并键(<<: *xyz)不起作用
  • 如果您需要ruamel.yaml的往返功能(yaml = YAML()),则需要更复杂的construct_yaml_map.
  • 如果要转储输出,则应为此实例化一个新的YAML()实例,而不是重新使用用于加载的已修补"实例(它可能会起作用,这只是为了确保):

  • Probably needless to say, this will not work with YAML merge keys (<<: *xyz)
  • If you need ruamel.yaml's round-trip capabilities (yaml = YAML()) , that will require a more complex construct_yaml_map.
  • If you want to dump the output, you should instantiate a new YAML() instance for that, instead of re-using the "patched" one used for loading (it might work, this is just to be sure):

yaml_out = YAML(typ='safe')
yaml_out.dump(data, sys.stdout)

(给出第一个construct_yaml_map):

- - build
  - - [step, step1]
- - build
  - - [step, step2]

  • PyYAML和ruamel.yaml中都不起作用的是yaml.load('file.yml').如果您不想自己自己open()该文件,则可以执行以下操作:

  • What doesn't work in PyYAML nor ruamel.yaml is yaml.load('file.yml'). If you don't want to open() the file yourself you can do:

    from pathlib import Path  # or: from ruamel.std.pathlib import Path
    yaml = YAML(typ='safe')
    yaml.load(Path('file.yml')
    

  • ¹免责声明:我是该软件包的作者.

    这篇关于使用Python在YAML中获取重复密钥的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆