在python中结合'with'和'yield'是否安全? [英] Is it safe to combine 'with' and 'yield' in python?

查看:17
本文介绍了在python中结合'with'和'yield'是否安全?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用上下文管理器自动关闭文件是python中的一个常见习语:

It's a common idiom in python to use context manager to automatically close files:

with open('filename') as my_file:
    # do something with my_file

# my_file gets automatically closed after exiting 'with' block

现在我想读取几个文件的内容.数据的消费者不知道也不关心数据是来自文件还是非文件.它不想检查它收到的对象是否可以打开.它只是想得到一些东西来读取行.所以我创建了一个这样的迭代器:

Now I want to read contents of several files. Consumer of the data does not know or care if data comes from files or not-files. It does not want to check if the objects it received can be open or not. It just wants to get something to read lines from. So I create an iterator like this:

def select_files():
    """Yields carefully selected and ready-to-read-from files"""
    file_names = [.......]
    for fname in file_names:
        with open(fname) as my_open_file:
            yield my_open_file

这个迭代器可以这样使用:

This iterator may be used like this:

for file_obj in select_files():
    for line in file_obj:
        # do something useful

(请注意,相同的代码可用于处理的不是打开的文件,而是字符串列表——这很酷!)

(Note, that the same code could be used to consume not the open files, but lists of strings - that's cool!)

问题是:生成打开的文件是否安全?

看起来像为什么不呢?".消费者调用迭代器,迭代器打开文件,将其交给消费者.消费者处理文件并返回到下一个迭代器.迭代器代码恢复,我们退出 'with' 块,my_open_file 对象被关闭,转到下一个文件,等等.

Looks like "why not?". Consumer calls iterator, iterator opens file, yields it to consumer. Consumer processes the file and comes back to iterator for next one. Iterator code resumes, we exit 'with' block, the my_open_file object gets closed, go to next file, etc.

但是如果消费者永远不会回到下一个文件的迭代器呢?F.e.消费者内部发生异常.或者消费者在其中一个文件中发现了一些非常令人兴奋的东西,并高兴地将结果返回给调用它的人?

But what if consumer never comes back to iterator for the next file? F.e. an exception occurred inside the consumer. Or consumer found something very exciting in one of the files and happily returned the results to whoever called it?

在这种情况下,迭代器代码永远不会恢复,我们永远不会到达 'with' 块的末尾,并且 my_open_file 对象永远不会关闭!

Iterator code would never resume in this case, we would never come to the end of 'with' block, and the my_open_file object would never get closed!

或者会吗?

推荐答案

您提出了之前提出的批评1.这种情况下的清理是不确定的,但是当生成器收集垃圾时,CPython发生这种情况.您的里程可能因其他 Python 实现而异...

You bring up a criticism that has been raised before1. The cleanup in this case is non-deterministic, but it will happen with CPython when the generator gets garbage collected. Your mileage may vary for other python implementations...

这是一个简单的例子:

from __future__ import print_function
import contextlib

@contextlib.contextmanager
def manager():
    """Easiest way to get a custom context manager..."""
    try:
        print('Entered')
        yield
    finally:
        print('Closed')


def gen():
    """Just a generator with a context manager inside.

    When the context is entered, we'll see "Entered" on the console
    and when exited, we'll see "Closed" on the console.
    """
    man = manager()
    with man:
        for i in range(10):
            yield i


# Test what happens when we consume a generator.
list(gen())

def fn():
    g = gen()
    next(g)
    # g.close()

# Test what happens when the generator gets garbage collected inside
# a function
print('Start of Function')
fn()
print('End of Function')

# Test what happens when a generator gets garbage collected outside
# a function.  IIRC, this isn't _guaranteed_ to happen in all cases.
g = gen()
next(g)
# g.close()
print('EOF')

在 CPython 中运行这个脚本,我得到:

Running this script in CPython, I get:

$ python ~/sandbox/cm.py
Entered
Closed
Start of Function
Entered
Closed
End of Function
Entered
EOF
Closed

基本上,我们看到的是,对于耗尽的生成器,上下文管理器会在您期望的时候进行清理.对于没有耗尽的生成器,清理函数会在垃圾收集器收集生成器时运行.当生成器超出范围时(或者最迟在下一个 gc.collect 循环中的 IIRC)会发生这种情况.

Basically, what we see is that for generators that are exhausted, the context manager cleans up when you expect. For generators that aren't exhausted, the cleanup function runs when the generator is collected by the garbage collector. This happens when the generator goes out of scope (or, IIRC at the next gc.collect cycle at the latest).

然而,做一些快速的实验(例如在 pypy 中运行上面的代码),我没有清理我所有的上下文管理器:

However, doing some quick experiments (e.g. running the above code in pypy), I don't get all of my context managers cleaned up:

$ pypy --version
Python 2.7.10 (f3ad1e1e1d62, Aug 28 2015, 09:36:42)
[PyPy 2.6.1 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
$ pypy ~/sandbox/cm.py
Entered
Closed
Start of Function
Entered
End of Function
Entered
EOF

因此,关于上下文管理器的 __exit__ 被所有 python 实现调用的断言是不正确的.可能这里的未命中可归因于 pypy 的垃圾收集策略(不是引用计数),当 pypy 决定收割生成器时,进程已经关闭,因此,它不会不用管它...在大多数现实世界的应用程序中,生成器可能很快就会被收割和完成,以至于它实际上并不重要...

So, the assertion that the context manager's __exit__ will get called for all python implementations is untrue. Likely the misses here are attributable to pypy's garbage collection strategy (which isn't reference counting) and by the time pypy decides to reap the generators, the process is already shutting down and therefore, it doesn't bother with it... In most real-world applications, the generators would probably get reaped and finalized quickly enough that it doesn't actually matter...

如果你想保证你的上下文管理器正确完成,你应该注意关闭 生成器完成后2.取消注释上面的 g.close() 行给了我确定性的清理,因为 GeneratorExityield 语句(在上下文管理器中)) 然后它被生成器捕获/抑制...

If you want to guarantee that your context manager is finalized properly, you should take care to close the generator when you are done with it2. Uncommenting the g.close() lines above gives me deterministic cleanup because a GeneratorExit is raised at the yield statement (which is inside the context manager) and then it's caught/suppressed by the generator...

$ pypy ~/sandbox/cm.py
Entered
Closed
Start of Function
Entered
Closed
End of Function
Entered
Closed
EOF

$ python3 ~/sandbox/cm.py
Entered
Closed
Start of Function
Entered
Closed
End of Function
Entered
Closed
EOF

$ python ~/sandbox/cm.py
Entered
Closed
Start of Function
Entered
Closed
End of Function
Entered
Closed
EOF

<小时>

FWIW,这意味着你可以使用 contextlib.closure 清理你的生成器:

from contextlib import closing
with closing(gen_function()) as items:
    for item in items:
        pass # Do something useful!

1最近,一些讨论围绕着PEP 533 旨在使迭代器清理更具确定性.
2关闭已经关闭和/或消耗的生成器是完全可以的,这样你就可以调用它而不必担心生成器的状态.

1Most recently, some discussion has revolved around PEP 533 which aims to make iterator cleanup more deterministic.
2It is perfectly OK to close an already closed and/or consumed generator so you can call it without worrying about the state of the generator.

这篇关于在python中结合'with'和'yield'是否安全?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆