使用python生成器和openstack swift客户端时出现问题 [英] issues working with python generators and openstack swift client

查看:123
本文介绍了使用python生成器和openstack swift客户端时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Openstack Swift客户端库时,Python生成器出现了问题.

I'm having a problem with Python generators while working with the Openstack Swift client library.

眼前的问题是我试图从特定的url(大约7MB)中检索一大串数据,将字符串分块成较小的位,然后发送回生成器类,每次迭代都保留一个分块的.字符串.在测试套件中,这只是一个字符串,已发送到swift客户端的monkeypatched类进行处理.

The problem at hand is that I am trying to retrieve a large string of data from a specific url (about 7MB), chunk the string into smaller bits, and send a generator class back, with each iteration holding a chunked bit of the string. in the test suite, this is just a string that's sent to a monkeypatched class of the swift client for processing.

monkeypatched类中的代码如下:

The code in the monkeypatched class looks like this:

def monkeypatch_class(name, bases, namespace):
    '''Guido's monkeypatch metaclass.'''
    assert len(bases) == 1, "Exactly one base class required"
    base = bases[0]
    for name, value in namespace.iteritems():
        if name != "__metaclass__":
            setattr(base, name, value)
    return base

在测试套件中:

from swiftclient import client
import StringIO
import utils

class Connection(client.Connection):
    __metaclass__ = monkeypatch_class

    def get_object(self, path, obj, resp_chunk_size=None, ...):
        contents = None
        headers = {}

        # retrieve content from path and store it in 'contents'
        ...

        if resp_chunk_size is not None:
            # stream the string into chunks
            def _object_body():
                stream = StringIO.StringIO(contents)
                buf = stream.read(resp_chunk_size)
                while buf:
                    yield buf
                    buf = stream.read(resp_chunk_size)
            contents = _object_body()
        return headers, contents

返回生成器对象后,存储类中的流函数调用了该对象:

After returning the generator object, it was called by a stream function in the storage class:

class SwiftStorage(Storage):

    def get_content(self, path, chunk_size=None):
        path = self._init_path(path)
        try:
            _, obj = self._connection.get_object(
                self._container,
                path,
                resp_chunk_size=chunk_size)
            return obj
        except Exception:
            raise IOError("Could not get content: {}".format(path))

    def stream_read(self, path):
        try:
            return self.get_content(path, chunk_size=self.buffer_size)
        except Exception:
            raise OSError(
                "Could not read content from stream: {}".format(path))

最后,在我的测试套件中:

And finally, in my test suite:

def test_stream(self):
    filename = self.gen_random_string()
    # test 7MB
    content = self.gen_random_string(7 * 1024 * 1024)
    self._storage.stream_write(filename, io)
    io.close()
    # test read / write
    data = ''
    for buf in self._storage.stream_read(filename):
        data += buf
    self.assertEqual(content,
                     data,
                     "stream read failed. output: {}".format(data))

输出结果如下:

======================================================================
FAIL: test_stream (test_swift_storage.TestSwiftStorage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bacongobbler/git/github.com/bacongobbler/docker-registry/test/test_local_storage.py", line 46, in test_stream
    "stream read failed. output: {}".format(data))
AssertionError: stream read failed. output: <generator object _object_body at 0x2a6bd20>

我尝试使用简单的python脚本隔离此脚本,该脚本遵循与上述代码相同的流程,并且顺利通过:

I tried isolating this with a simple python script that follows the same flow as the code above, which passed without issues:

def gen_num():
    def _object_body():
        for i in range(10000000):
            yield i
    return _object_body()

def get_num():
    return gen_num()

def stream_read():
    return get_num()

def main():
    num = 0
    for i in stream_read():
        num += i
    print num

if __name__ == '__main__':
    main()

非常感谢您提供有关此问题的帮助:)

Any help with this issue is greatly appreciated :)

推荐答案

在您的get_object方法中,您将_object_body()的返回值分配给contents变量.但是,该变量也是保存您实际数据的变量,并且早在_object_body中就已使用.

In your get_object method, you're assigning the return value of _object_body() to the contents variable. However, that variable is also the one that holds your actual data, and it's used early on in _object_body.

问题在于_object_body是生成器函数(它使用yield).因此,当您调用它时,它会生成一个生成器对象,但是该函数的代码直到您遍历该生成器时才开始运行.这意味着当函数的代码实际开始运行时(_test_stream中的for循环),重新分配contents = _object_body()的时间很长.

The problem is that _object_body is a generator function (it uses yield). Therefore, when you call it, it produces a generator object, but the code of the function doesn't start running until you iterate over that generator. Which means that when the function's code actually starts running (the for loop in _test_stream), it's long after you've reassigned contents = _object_body().

您的stream = StringIO(contents)因此创建了一个StringIO对象,其中包含生成器对象(因此会出现错误消息),而不是 数据.

Your stream = StringIO(contents) therefore creates a StringIO object containing the generator object (hence your error message), not the data.

以下是说明问题的最小复制案例:

Here's a minimal reproduction case that illustrates the problem:

def foo():
    contents = "Hello!"

    def bar():
        print contents
        yield 1

    # Only create the generator. This line runs none of the code in bar.
    contents = bar()

    print "About to start running..."
    for i in contents:
        # Now we run the code in bar, but contents is now bound to 
        # the generator object. So this doesn't print "Hello!"
        pass

这篇关于使用python生成器和openstack swift客户端时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆