使用ruamel.yaml加载和转储多个yaml文件(python) [英] Loading and dumping multiple yaml files with ruamel.yaml (python)

查看:139
本文介绍了使用ruamel.yaml加载和转储多个yaml文件(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用python 2(atm)和ruamel.yaml 0.13.14(RedHat EPEL)

Using python 2 (atm) and ruamel.yaml 0.13.14 (RedHat EPEL)

我目前正在编写一些代码来加载yaml定义,但是它们被分成多个文件.用户可编辑的部分包含例如.

I'm currently writing some code to load yaml definitions, but they are split up in multiple files. The user-editable part contains eg.

users:
  xxxx1:
    timestamp: '2018-10-22 11:38:28.541810'
    << : *userdefaults
  xxxx2:
    << : *userdefaults
    timestamp: '2018-10-22 11:38:28.541810'

默认设置存储在另一个文件中,该文件不可

the defaults are stored in another file, which is not editable:

userdefaults: &userdefaults
    # Default values for user settings
    fileCountQuota: 1000
    diskSizeQuota: "300g"

我可以通过同时加载和隐藏字符串,然后通过merged_data = list(yaml.load_all("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader))运行它们来正确处理所有问题,从而一起处理这些文件. (当不使用RoundTripLoader时,会出现无法解析引用的错误,这是正常的)

I can process these together by loading both and concatinating the strings, and then running them through merged_data = list(yaml.load_all("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader)) which correctly resolves everything. (when not using RoundTripLoader I get errors that the references cannot be resolved, which is normal)

现在,我想通过python代码进行一些更新(例如,更新时间戳),为此,我只需要写回用户部分.这就是事情多毛的地方.我还没有找到一种方法来编写yaml文档,而不是两者都写.

Now, I want to do some updates via python code (eg. update the timestamp), and for that I need to just write back the user part. And that's where things get hairy. I sofar haven't found a way to just write that yaml document, not both.

推荐答案

首先,除非默认文件中包含多个文档,否则您将 不必使用load_all,因为您无需将两个文档连接到一个 多文档流.如果您使用带有文档结尾的格式字符串 标记("{}\n...\n{}")或带有指令结束标记("{}\n---\n{}") 您的别名不会从一个文档转移到另一个文档,按照 YAML规范:

First of all, unless there are multiple documents in your defaults file, you don't have to use load_all, as you don't concatenate two documents into a multiple-document stream. If you had by using a format string with a document-end marker ("{}\n...\n{}") or with a directives-end marker ("{}\n---\n{}") your aliases would not carry over from one document to another, as per the YAML specification:

别名节点使用不使用的锚点是错误的 以前出现在文档中.

It is an error for an alias node to use an anchor that does not previously occur in the document.

锚点必须位于文档中,而不仅仅是流中(它可以包含多个 文档).

The anchor has to be in the document, not just in the stream (which can consist of multiple documents).

我尝试了一些轨迹法,预先填充了已经表示的字典 锚节点数:

I tried some hocus pocus, pre-populating the already represented dictionary of anchored nodes:

import sys
import datetime
from ruamel import yaml

def load():
    with open('defaults.yaml') as fp:
        defaults_data = fp.read()
    with open('user.yaml') as fp:
        user_data = fp.read()
    merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), 
                            Loader=yaml.RoundTripLoader)
    return merged_data

class MyRTDGen(object):
    class MyRTD(yaml.RoundTripDumper):
        def __init__(self, *args, **kw):
            pps = kw.pop('pre_populate', None)
            yaml.RoundTripDumper.__init__(self, *args, **kw)
            if pps is not None:
                for pp in pps:
                    try:
                        anchor = pp.yaml_anchor()
                    except AttributeError:
                        anchor = None
                    node = yaml.nodes.MappingNode(
                        u'tag:yaml.org,2002:map', [], flow_style=None, anchor=anchor)
                    self.represented_objects[id(pp)] = node

    def __init__(self, pre_populate=None):
        assert isinstance(pre_populate, list)
        self._pre_populate = pre_populate 

    def __call__(self, *args, **kw):
        kw1 = kw.copy()
        kw1['pre_populate'] = self._pre_populate
        myrtd = self.MyRTD(*args, **kw1)
        return myrtd


def update(md, file_name):
    ud = md.pop('userdefaults')
    MyRTD = MyRTDGen([ud])
    yaml.dump(md, sys.stdout, Dumper=MyRTD)
    with open(file_name, 'w') as fp:
        yaml.dump(md, fp, Dumper=MyRTD)

md = load()
md['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow())
update(md, 'user.yaml')

由于基于PyYAML的API需要一个类而不是一个对象,因此您需要 使用类生成器,它实际上添加了要预先填充的数据元素 yaml.load()带来的苍蝇.

Since the PyYAML based API requires a class instead of an object, you need to use a class generator, that actually adds the data elements to pre-populate on the fly from withing yaml.load().

但这是行不通的,因为节点只有在锚定后才被写出 确定使用了锚点(即有第二个参考).所以实际上 第一个合并键将作为锚点写出.虽然我很熟悉 有了代码库,我无法在合理的时间内使它正常工作.

But this doesn't work, as a node only gets written out with an anchor once it is determined that the anchor is used (i.e. there is a second reference). So actually the first merge key gets written out as an anchor. And although I am quite familiar with the code base, I could not get this to work properly in a reasonable amount of time.

因此,我只依靠这样一个事实,即只有一个匹配的键 users.yaml的第一个键位于合并更新的转储的根级别 归档并剥离之前的所有内容.

So instead, I would just rely on the fact that there is only one key that matches the first key of users.yaml at the root level of the dump of the combined updated file and strip anything before that.

import sys
import datetime
from ruamel import yaml

with open('defaults.yaml') as fp:
    defaults_data = fp.read()
with open('user.yaml') as fp:
    user_data = fp.read()
merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), 
                        Loader=yaml.RoundTripLoader)

# find the key
for line in user_data.splitlines():
    line = line.split('# ')[0].rstrip()  # end of line comment, not checking for strings
    if line and line[-1] == ':' and line[0] != ' ':
        split_key = line
        break

merged_data['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow())

buf = yaml.compat.StringIO()
yaml.dump(merged_data, buf, Dumper=yaml.RoundTripDumper)
document = split_key + buf.getvalue().split('\n' + split_key)[1]
sys.stdout.write(document)

给出:

users:
  xxxx1:
    <<: *userdefaults
    timestamp: '2018-10-22 11:38:28.541810'
  xxxx2:
    <<: *userdefaults
    timestamp: '2018-10-23 09:59:13.829978'


我必须创建一个virtualenv来确保可以使用ruamel.yaml==0.13.14运行以上命令. 这个版本是从我小时候开始的(我不会声称自己是无辜的). 从那时起,已经有超过85个版本的库.


I had to make a virtualenv to make sure I could run the above with ruamel.yaml==0.13.14. That version is from the time I was still young (I won't claim to have been innocent). There have been over 85 releases of the library since then.

我了解您可能无法运行任何东西,但 Python2目前无法编译/使用较新版本.但是呢 您真正应该做的是安装virtualenv(可以使用EPEL完成,也可以不使用 进一步污染"您的系统安装),为 您正在开发并安装最新版本的ruamel.yaml的代码(以及 您的其他图书馆).如果需要,您也可以这样做 要将软件分发到其他系统,也只需在此处安装virtualenv.

I can understand that you might not be able to run anything but Python2 at the moment and cannot compile/use a newer version. But what you really should do is install virtualenv (can be done using EPEL, but also without further "polluting" your system installation), make a virtualenv for the code you are developping and install the latest version of ruamel.yaml (and your other libraries) in there. You can also do that if you need to distribute your software to other systems, just install virtualenv there as well.

我在/opt/util下拥有所有实用程序,并进行了管理 virtualenvutils a 围绕virtualenv的包装器.

I have all my utilties under /opt/util, and managed virtualenvutils a wrapper around virtualenv.

这篇关于使用ruamel.yaml加载和转储多个yaml文件(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆