为什么PyYAML使用生成器来构造对象? [英] Why does PyYAML use generators to construct objects?

查看:127
本文介绍了为什么PyYAML使用生成器来构造对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读PyYAML源代码,以尝试了解如何定义可以用add_constructor添加的适当的构造函数.我对该代码现在的工作方式有很好的了解,但是我仍然不明白为什么SafeConstructor中的默认YAML构造函数是生成器.例如,SafeConstructor的方法construct_yaml_map:

I've been reading the PyYAML source code to try to understand how to define a proper constructor function that I can add with add_constructor. I have a pretty good understanding of how that code works now, but I still don't understand why the default YAML constructors in the SafeConstructor are generators. For example, the method construct_yaml_map of SafeConstructor:

def construct_yaml_map(self, node):
    data = {}
    yield data
    value = self.construct_mapping(node)
    data.update(value)

我了解如何按照以下方式在BaseConstructor.construct_object中使用生成器来存根对象,并且仅在将deep=False传递给construct_mapping时,才使用节点中的数据填充该对象:

I understand how the generator is used in BaseConstructor.construct_object as follows to stub out an object and only populate it with data from the node if deep=False is passed to construct_mapping:

    if isinstance(data, types.GeneratorType):
        generator = data
        data = generator.next()
        if self.deep_construct:
            for dummy in generator:
                pass
        else:
            self.state_generators.append(generator)

而且我了解在deep=False表示construct_mapping的情况下,如何在BaseConstructor.construct_document中生成数据.

And I understand how the data is generated in BaseConstructor.construct_document in the case where deep=False for construct_mapping.

def construct_document(self, node):
    data = self.construct_object(node)
    while self.state_generators:
        state_generators = self.state_generators
        self.state_generators = []
        for generator in state_generators:
            for dummy in generator:
                pass

我不了解的是好处,它存根数据对象并通过遍历construct_document中的生成器来遍历这些对象.是否必须这样做才能支持YAML规范中的某些功能,还是提供性能优势?

What I don't understand is the benefit of stubbing out the data objects and working down through the objects by iterating over the generators in construct_document. Does this have to be done to support something in the YAML spec, or does it provide a performance benefit?

关于另一个问题的答案有些帮助,但我不明白为什么这个答案如此:

This answer on another question was somewhat helpful, but I don't understand why that answer does this:

def foo_constructor(loader, node):
    instance = Foo.__new__(Foo)
    yield instance
    state = loader.construct_mapping(node, deep=True)
    instance.__init__(**state)

代替此:

def foo_constructor(loader, node):
    state = loader.construct_mapping(node, deep=True)
    return Foo(**state)

我已经测试了后一种形式适用于其他答案上发布的示例,但也许我缺少一些极端情况.

I've tested that the latter form works for the examples posted on that other answer, but perhaps I am missing some edge case.

我正在使用3.10版的PyYAML,但看起来有问题的代码与最新版的PyYAML(3.12)相同.

I am using version 3.10 of PyYAML, but it looks like the code in question is the same in the latest version (3.12) of PyYAML.

推荐答案

在YAML中,您可以锚定和别名.这样,您可以直接或间接地建立自我引用的结构.

In YAML you can have anchors and aliases. With that you can make self-referential structures, directly or indirectly.

如果YAML没有这种自我引用的可能性,则可以先构造所有子代,然后一次性创建父代结构.但是由于自身的引用,您可能还没有孩子来填写"正在创建的结构.通过使用生成器的两步过程(我将其称为两步,因为在进入方法结尾之前它只有一个产量),您可以部分创建一个对象,并使用自引用填充该对象,因为对象存在(即定义了它在内存中的位置).

If YAML would not have this possibility of self-reference, you could just first construct all the children and then create the parent structure in one go. But because of the self-references you might not have the child yet to "fill-out" the structure that you are creating. By using the two-step process of the generator (I call this two step, because it has only one yield before you come to the end of the method), you can create an object partially and the fill it out with a self-reference, because the object exist (i.e. its place in memory is defined).

好处不是在速度上,而是纯粹因为使自引用成为可能.

The benefit is not in speed, but purely because of making the self-reference possible.

如果您从所引用的答案中简化示例,则会加载以下内容:

If you simplify the example from the answer you refer to a bit, the following loads:

import sys
import ruamel.yaml as yaml


class Foo(object):
    def __init__(self, s, l=None, d=None):
        self.s = s
        self.l1, self.l2 = l
        self.d = d


def foo_constructor(loader, node):
    instance = Foo.__new__(Foo)
    yield instance
    state = loader.construct_mapping(node, deep=True)
    instance.__init__(**state)

yaml.add_constructor(u'!Foo', foo_constructor)

x = yaml.load('''
&fooref
!Foo
s: *fooref
l: [1, 2]
d: {try: this}
''', Loader=yaml.Loader)

yaml.dump(x, sys.stdout)

但如果将foo_constructor()更改为:

def foo_constructor(loader, node):
    instance = Foo.__new__(Foo)
    state = loader.construct_mapping(node, deep=True)
    instance.__init__(**state)
    return instance

(取消收益,添加最终收益),您将得到ConstructorError:作为消息

(yield removed, added a final return), you get a ConstructorError: with as message

found unconstructable recursive node 
  in "<unicode string>", line 2, column 1:
    &fooref

PyYAML应该给出类似的消息.检查该错误的回溯,您可以看到ruamel.yaml/PyYAML在源代码中尝试解析别名的位置.

PyYAML should give a similar message. Inspect the traceback on that error and you can see where ruamel.yaml/PyYAML tries to resolve the alias in the source code.

这篇关于为什么PyYAML使用生成器来构造对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆