Python子流程:它们如何/何时关闭文件? [英] Python Subprocess: how/when do they close file?
问题描述
我想知道为什么子进程会打开这么多文件.我有一个示例,其中某些文件似乎永远保持打开状态(在子进程完成之后,甚至在程序崩溃之后).
I wonder why subprocesses keep so many files open. I have an example in which some files seem to remain open forever (after the subprocess finishes and even after the program crashes).
考虑以下代码:
import aiofiles
import tempfile
async def main():
return [await fds_test(i) for i in range(2000)]
async def fds_test(index):
print(f"Writing {index}")
handle, temp_filename = tempfile.mkstemp(suffix='.dat', text=True)
async with aiofiles.open(temp_filename, mode='w') as fp:
await fp.write('stuff')
await fp.write('other stuff')
await fp.write('EOF\n')
print(f"Reading {index}")
bash_cmd = 'cat {}'.format(temp_filename)
process = await asyncio.create_subprocess_exec(*bash_cmd.split(), stdout=asyncio.subprocess.DEVNULL, close_fds=True)
await process.wait()
print(f"Process terminated {index}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
这会依次产生一个进程.我希望与此同时打开的文件数也将是1.事实并非如此,有时我会收到以下错误:
This spawns processes one after the other (sequentially). I expect the number of files simultaneously opened by this to also be one. But it's not the case and at some point I get the following error:
/Users/cglacet/.pyenv/versions/3.8.0/lib/python3.8/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
1410 # Data format: "exception name:hex errno:description"
1411 # Pickle is not used; it is complex and involves memory allocation.
-> 1412 errpipe_read, errpipe_write = os.pipe()
1413 # errpipe_write must not be in the standard io 0, 1, or 2 fd range.
1414 low_fds_to_close = []
OSError: [Errno 24] Too many open files
我尝试运行不带选项stdout=asyncio.subprocess.DEVNULL
的相同代码,但仍然崩溃. 建议的答案可能是问题出在哪里,并且错误也指向errpipe_read, errpipe_write = os.pipe()
行.但这似乎不是问题所在(在没有该选项的情况下运行会产生相同的错误).
I tried running the same code without the option stdout=asyncio.subprocess.DEVNULL
but it still crashes. This answer suggested it might be where the problem comes from and the error also points at the line errpipe_read, errpipe_write = os.pipe()
. But it doesn't seem like this is the problem (running without that option gives the same error).
如果您需要更多信息,请参见lsof | grep python
的概述:
In case you need more information, here is an overview from the output of lsof | grep python
:
python3.8 19529 cglacet 7u REG 1,5 138 12918796819 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpuxu_o4mf.dat
# ...
# ~ 2000 entries later :
python3.8 19529 cglacet 2002u REG 1,5 848 12918802386 /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/tmpcaakgz3f.dat
这些是我的子进程正在读取的临时文件. lsof
的其余输出似乎是合法的东西(打开了库,例如pandas
/numpy
/scipy
/etc).
These are the temporary files that my subprocesses are reading. The rest of the output from lsof
seems like legit stuff (libraries opened, like pandas
/numpy
/scipy
/etc.).
现在我有些疑问:也许问题出在aiofiles
异步上下文管理器?也许是不是不关闭文件而不是create_subprocess_exec
的那个?
Now I have some doubt: maybe the problem comes from aiofiles
asynchronous context manager? Maybe it's the one not closing the files and not create_subprocess_exec
?
这里有一个类似的问题,但没有人真正尝试解释/解决问题(仅建议增加限制):
There is a similar question here, but nobody really try to explain/solve the problem (and only suggest increasing the limit) : Python Subprocess: Too Many Open Files. I would really like to understand what is going on, my first goal is not necessarily to temporarily solve the problem (in the future I want to be able to run function fds_test
as many times as needed). My goal is to have a function that behave as expected. I probably have to change either my expectation or my code, that's why I ask this question.
如此处注释中所建议,我还尝试运行python -m test test_subprocess -m test_close_fds -v
,它给出了:
As suggested in the comments here, I also tried to run python -m test test_subprocess -m test_close_fds -v
which gives:
== CPython 3.8.0 (default, Nov 28 2019, 20:06:13) [Clang 11.0.0 (clang-1100.0.33.12)]
== macOS-10.14.6-x86_64-i386-64bit little-endian
== cwd: /private/var/folders/sn/_pq5fxn96kj3m135j_b76sb80000gp/T/test_python_52961
== CPU count: 8
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 5.29 Run tests sequentially
0:00:00 load avg: 5.29 [1/1] test_subprocess
test_close_fds (test.test_subprocess.POSIXProcessTestCase) ... ok
test_close_fds (test.test_subprocess.Win32ProcessTestCase) ... skipped 'Windows specific tests'
----------------------------------------------------------------------
Ran 2 tests in 0.142s
OK (skipped=1)
== Tests result: SUCCESS ==
1 test OK.
Total duration: 224 ms
Tests result: SUCCESS
因此,似乎应该正确关闭文件了,我在这里有点迷失了.
So it seems files should be correctly closed, I'm a bit lost here.
推荐答案
问题并非来自create_subprocess_exec
,此代码中的问题是
The problem doesn't come from create_subprocess_exec
the problem in this code is that tempfile.mkstemp()
actually opens the file:
mkstemp()将一个包含操作系统级别句柄的元组返回到一个打开的文件(如os.open()将返回的那样)…
mkstemp() returns a tuple containing an OS-level handle to an open file (as would be returned by os.open()) …
我认为它只会创建文件.为了解决我的问题,我仅添加了对os.close(handle)
的调用.这消除了错误,但有点奇怪(两次打开文件).因此,我将其重写为:
I thought it would only create the file. To solve my problem I simply added a call to os.close(handle)
. Which removes the error but is a bit weird (opens a file twice). So I rewrote it as:
import aiofiles
import tempfile
import uuid
async def main():
await asyncio.gather(*[fds_test(i) for i in range(10)])
async def fds_test(index):
dir_name = tempfile.gettempdir()
file_id = f"{tempfile.gettempprefix()}{uuid.uuid4()}"
temp_filename = f"{dir_name}/{file_id}.dat"
async with aiofiles.open(temp_filename, mode='w') as fp:
await fp.write('stuff')
bash_cmd = 'cat {}'.format(temp_filename)
process = await asyncio.create_subprocess_exec(*bash_cmd.split(), close_fds=True)
await process.wait()
if __name__ == "__main__":
import asyncio
asyncio.run(main())
现在,我想知道为什么错误是由subprocess
而不是tempfile.mkstemp
引起的,也许是因为它的子进程打开了太多文件,以致于临时文件的创建不可能突破极限……
Now I wonder why the error was raised by subprocess
and not tempfile.mkstemp
, maybe because it subprocess opens so much more files that it makes it unlikely that the temporary file creation is what breaks the limit …
这篇关于Python子流程:它们如何/何时关闭文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!