如何在python中解析深度嵌套的yaml数据结构 [英] How to parse deeply nested yaml data structures in python
问题描述
我们有一个类似于以下内容的 YAML 文件:
We have a YAML file which looks somewhat like the following:
all:
children:
allnetxsites:
children:
netxsites:
hosts:
bar.:
ansible_ssh_host: bart.j
domain: bart.local.domain
nfs: lars.local.domain
我将如何获取值 bar.
和键 nfs
的值?
How would I go about getting the value bar.
and the value for the key nfs
?
Python 代码:
import yaml
with open("/Users/brendan_vandercar/sites.yaml", 'r') as stream:
data_loaded = yaml.load(stream)
for element in data_loaded:
name = "element"['all']['children']['allnetxsites']['children']['netxsites']['hosts']['bart']['nfs'][0]
print(name)
我想得到的是这个脚本的列表输出,它具有以下内容:
What I would like to get is a list output from this script that has the below:
Domain: bart.local.domain
NFS: lars.local.domain
推荐答案
你的标题让你看起来对什么是继续进行,或者至少是关于术语:虽然YAML 数据结构"可能被解释为Python 数据结构"的简写从 YAML 文档加载",您无需进一步解析该数据结构体.任何解析都作为 YAML 加载的一部分完成文档和解析在 yaml.load()
之前就已经完全完成了返回.作为加载的结果,您在 Python 中拥有了一个数据结构,并且您只需要"通过以下方式在嵌套的 Python 数据结构中查找键递归遍历该数据结构.
Your title makes it look like you are a bit confused about what is
going on, or at least about terminology: although "YAML data
structure" might be construed as shorthand for "Python data structure
loaded from a YAML document", you do not further parse that data
structure. Any parsing is done as part of the loading of the YAML
document and parsing is completely finished even before yaml.load()
returns. As a result of that loading you have a data structure in Python and
you "just" need to lookup a key in a nested Python data-structure by
recursively walking that data structure.
你的 YAML 示例有点无趣,因为它只代表一个真实 YAML 的一小部分,因为您的 YAML 仅包含(纯)标量它们是字符串、映射和映射键,它们是标量.
Your YAML example is somewhat uninteresting, as it only represents a tiny subset of real YAML as your YAML only consists of (plain) scalars that are strings, mappings, and mapping keys that are scalars.
为了遍历该数据结构,提供了递归函数@aaaaaa 的简化版本会做:
To walk over that data structure a simplified version of the recursive function @aaaaaa presented will do:
import sys
import yaml
yaml_str = """\
all:
children:
allnetxsites:
children:
netxsites:
hosts:
bar.:
ansible_ssh_host: bart.j
domain: bart.local.domain
nfs: lars.local.domain
"""
data = yaml.safe_load(yaml_str)
def find(key, dictionary):
# everything is a dict
for k, v in dictionary.items():
if k == key:
yield v
elif isinstance(v, dict):
for result in find(key, v):
yield result
for x in find("nfs", data):
print(x)
打印预期:
lars.local.domain
我已经简化了函数 find
因为在版本中的列表处理片段不正确.
I have simplified the function find
because the list handling in the version in the
snippet is incorrect.
虽然使用的标量种类不影响递归查找,你可能想要一个更通用的解决方案来处理 YAML(嵌套)序列、标记节点和复杂的映射键.
Although the kinds of scalars used do not affect the recursive lookup, you probably want a more generic solution that can handle YAML with (nested) sequences, tagged nodes and complex mapping keys as well.
假设您的输入文件是稍微复杂一点的input.yaml
:
Assuming your input file to be the slightly more complex input.yaml
:
all:
{a: x}: !xyz
- [k, l, 0943]
children:
allnetxsites:
children:
netxsites:
hosts:
bar.:
ansible_ssh_host: bart.j
domain: bart.local.domain
nfs: lars.local.domain
你可以使用 ruamel.yaml
(免责声明:我是那个包的作者)来做:
You can use ruamel.yaml
(disclaimer: I am the author of that package) to do:
import sys
from pathlib import Path
import ruamel.yaml
in_file = Path('input.yaml')
yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)
def lookup(sk, d, path=[]):
# lookup the values for key(s) sk return as list the tuple (path to the value, value)
if isinstance(d, dict):
for k, v in d.items():
if k == sk:
yield (path + [k], v)
for res in lookup(sk, v, path + [k]):
yield res
elif isinstance(d, list):
for item in d:
for res in lookup(sk, item, path + [item]):
yield res
for path, value in lookup("nfs", data):
print(path, '->', value)
给出:
['all', 'children', 'allnetxsites', 'children', 'netxsites', 'hosts', 'bar.', 'nfs'] -> lars.local.domain
由于 PyYAML 仅解析 YAML 1.1 的一个子集并且加载更少也就是说,它无法处理 input.yaml
中的有效 YAML.
As PyYAML only parses a subset of YAML 1.1 and loads even less of
that, it cannot handle the valid YAML in input.yaml
.
上面提到的代码片段,@aaaaa 正在使用的代码片段将会中断由于(直接)嵌套的序列/列表而加载的 YAML
The abovementioned snippet, the one @aaaaa is using, is will break on the loaded YAML because of the (directly) nested sequences/lists
这篇关于如何在python中解析深度嵌套的yaml数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!