使用Python在YAML中获取重复密钥 [英] Getting duplicate keys in YAML using Python
问题描述
我们需要解析包含重复密钥的YAML文件,而所有这些都需要解析.跳过重复项是不够的.我知道这违反了YAML规范,我不想这样做,但是我们使用的第三方工具可以启用这种用法,我们需要对其进行处理.
We are in need of parsing YAML files which contain duplicate keys and all of these need to be parsed. It is not enough to skip duplicates. I know this is against the YAML spec and I would like to not have to do it, but a third-party tool used by us enables this usage and we need to deal with it.
文件示例:
build:
step: 'step1'
build:
step: 'step2'
解析后,我们应该具有与此类似的数据结构:
After parsing we should have a similar data structure to this:
yaml.load('file.yml')
# [('build', [('step', 'step1')]), ('build', [('step', 'step2')])]
dict
不能再用于表示已解析的内容.
dict
can no longer be used to represent the parsed contents.
我正在寻找Python中的解决方案,但没有找到支持该解决方案的库,我错过了什么吗?
I am looking for a solution in Python and I didn't find a library supporting this, have I missed anything?
或者,我很乐于写自己的东西,但希望使其尽可能简单. ruamel.yaml
看起来像Python中最高级的YAML解析器,并且看起来可以适度扩展,可以扩展以支持重复字段吗?
Alternatively, I am happy to write my own thing but would like to make it as simple as possible. ruamel.yaml
looks like the most advanced YAML parser in Python and it looks moderately extensible, can it be extended to support duplicate fields?
推荐答案
PyYAML只会默默覆盖第一个条目 ruamel.yaml ¹将给出DuplicateKeyFutureWarning
,并在新API上产生DuplicateKeyError
.
PyYAML will just silently overwrite the first entry, ruamel.yaml¹ will give a DuplicateKeyFutureWarning
if used with the legacy API, and raise a DuplicateKeyError
with the new API.
如果您不想为所有类型创建完整的Constructor
,则可以覆盖SafeConstructor
中的映射构造函数来完成此工作:
If you don't want to create a full Constructor
for all types, overwriting the mapping constructor in SafeConstructor
should do the job:
import sys
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor
yaml_str = """\
build:
step: 'step1'
build:
step: 'step2'
"""
def construct_yaml_map(self, node):
# test if there are duplicate node keys
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
SafeConstructor.add_constructor(u'tag:yaml.org,2002:map', construct_yaml_map)
yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
print(data)
给出:
[('build', [('step', 'step1')]), ('build', [('step', 'step2')])]
但是,似乎没有必要将step: 'step1'
放入列表中.以下仅在有重复项的情况下创建列表(如果需要,可以通过缓存self.construct_object(key_node, deep=True)
的结果进行优化):
However it doesn't seem necessary to make step: 'step1'
into a list. The following will only create the list if there are duplicate items (could be optimised if necessary, by caching the result of the self.construct_object(key_node, deep=True)
):
def construct_yaml_map(self, node):
# test if there are duplicate node keys
keys = set()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
if key in keys:
break
keys.add(key)
else:
data = {} # type: Dict[Any, Any]
yield data
value = self.construct_mapping(node)
data.update(value)
return
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
给出:
[('build', {'step': 'step1'}), ('build', {'step': 'step2'})]
一些要点:
- 也许不用多说,这对 YAML合并键(
<<: *xyz
)不起作用 - 如果您需要ruamel.yaml的往返功能(
yaml = YAML()
),则需要更复杂的construct_yaml_map
. -
如果要转储输出,则应为此实例化一个新的
YAML()
实例,而不是重新使用用于加载的已修补"实例(它可能会起作用,这只是为了确保):
- Probably needless to say, this will not work with YAML merge keys (
<<: *xyz
) - If you need ruamel.yaml's round-trip capabilities (
yaml = YAML()
) , that will require a more complexconstruct_yaml_map
. If you want to dump the output, you should instantiate a new
YAML()
instance for that, instead of re-using the "patched" one used for loading (it might work, this is just to be sure):
yaml_out = YAML(typ='safe')
yaml_out.dump(data, sys.stdout)
(给出第一个construct_yaml_map
):
- - build
- - [step, step1]
- - build
- - [step, step2]
PyYAML和ruamel.yaml中都不起作用的是yaml.load('file.yml')
.如果您不想自己自己open()
该文件,则可以执行以下操作:
What doesn't work in PyYAML nor ruamel.yaml is yaml.load('file.yml')
. If you don't want to open()
the file yourself you can do:
from pathlib import Path # or: from ruamel.std.pathlib import Path
yaml = YAML(typ='safe')
yaml.load(Path('file.yml')
¹免责声明:我是该软件包的作者.
这篇关于使用Python在YAML中获取重复密钥的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!