如何在python中解析深度嵌套的yaml数据结构 [英] How to parse deeply nested yaml data structures in python

查看:21
本文介绍了如何在python中解析深度嵌套的yaml数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个类似于以下内容的 YAML 文件:

We have a YAML file which looks somewhat like the following:

all:
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain

我将如何获取值 bar. 和键 nfs 的值?

How would I go about getting the value bar. and the value for the key nfs?

Python 代码:

import yaml
with open("/Users/brendan_vandercar/sites.yaml", 'r') as stream:
    data_loaded = yaml.load(stream)

for element in data_loaded:
    name = "element"['all']['children']['allnetxsites']['children']['netxsites']['hosts']['bart']['nfs'][0]
    print(name)

我想得到的是这个脚本的列表输出,它具有以下内容:

What I would like to get is a list output from this script that has the below:

Domain: bart.local.domain
NFS: lars.local.domain

推荐答案

你的标题让你看起来对什么是继续进行,或者至少是关于术语:虽然YAML 数据结构"可能被解释为Python 数据结构"的简写从 YAML 文档加载",您无需进一步解析该数据结构体.任何解析都作为 YAML 加载的一部分完成文档和解析在 yaml.load() 之前就已经完全完成了返回.作为加载的结果,您在 Python 中拥有了一个数据结构,并且您只需要"通过以下方式在嵌套的 Python 数据结构中查找键递归遍历该数据结构.

Your title makes it look like you are a bit confused about what is going on, or at least about terminology: although "YAML data structure" might be construed as shorthand for "Python data structure loaded from a YAML document", you do not further parse that data structure. Any parsing is done as part of the loading of the YAML document and parsing is completely finished even before yaml.load() returns. As a result of that loading you have a data structure in Python and you "just" need to lookup a key in a nested Python data-structure by recursively walking that data structure.

你的 YAML 示例有点无趣,因为它只代表一个真实 YAML 的一小部分,因为您的 YAML 仅包含(纯)标量它们是字符串、映射和映射键,它们是标量.

Your YAML example is somewhat uninteresting, as it only represents a tiny subset of real YAML as your YAML only consists of (plain) scalars that are strings, mappings, and mapping keys that are scalars.

为了遍历该数据结构,提供了递归函数@aaaaaa 的简化版本会做:

To walk over that data structure a simplified version of the recursive function @aaaaaa presented will do:

import sys
import yaml

yaml_str = """\
all:
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain
"""

data = yaml.safe_load(yaml_str)

def find(key, dictionary):
    # everything is a dict
    for k, v in dictionary.items():
        if k == key:
            yield v
        elif isinstance(v, dict):
            for result in find(key, v):
                yield result

for x in find("nfs", data):
    print(x)

打印预期:

lars.local.domain

我已经简化了函数 find 因为在版本中的列表处理片段不正确.

I have simplified the function find because the list handling in the version in the snippet is incorrect.

虽然使用的标量种类不影响递归查找,你可能想要一个更通用的解决方案来处理 YAML(嵌套)序列、标记节点和复杂的映射键.

Although the kinds of scalars used do not affect the recursive lookup, you probably want a more generic solution that can handle YAML with (nested) sequences, tagged nodes and complex mapping keys as well.

假设您的输入文件是稍微复杂一点的input.yaml:

Assuming your input file to be the slightly more complex input.yaml:

all:
  {a: x}: !xyz
  - [k, l, 0943]
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain

你可以使用 ruamel.yaml(免责声明:我是那个包的作者)来做:

You can use ruamel.yaml (disclaimer: I am the author of that package) to do:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('input.yaml')

yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)

def lookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)
   if isinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elif isinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res

for path, value in lookup("nfs", data):
    print(path, '->', value)

给出:

['all', 'children', 'allnetxsites', 'children', 'netxsites', 'hosts', 'bar.', 'nfs'] -> lars.local.domain

由于 PyYAML 仅解析 YAML 1.1 的一个子集并且加载更少也就是说,它无法处理 input.yaml 中的有效 YAML.

As PyYAML only parses a subset of YAML 1.1 and loads even less of that, it cannot handle the valid YAML in input.yaml.

上面提到的代码片段,@aaaaa 正在使用的代码片段将会中断由于(直接)嵌套的序列/列表而加载的 YAML

The abovementioned snippet, the one @aaaaa is using, is will break on the loaded YAML because of the (directly) nested sequences/lists

这篇关于如何在python中解析深度嵌套的yaml数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆