Python subprocess.Popen "OSError: [Errno 12] 无法分配内存"; [英] Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"

查看:47
本文介绍了Python subprocess.Popen "OSError: [Errno 12] 无法分配内存";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:这个问题最初是问此处 但即使实际上没有找到可接受的答案,赏金时间也已过期.我正在重新提出这个问题,包括原始问题中提供的所有详细信息.

Note: This question was originally asked here but the bounty time expired even though an acceptable answer was not actually found. I am re-asking this question including all details provided in the original question.

python 脚本使用 sched 每 60 秒运行一组类函数模块:

A python script is running a set of class functions every 60 seconds using the sched module:

# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))

脚本作为守护进程运行,使用代码此处.

The script is running as a daemonised process using the code here.

作为 doChecks 一部分调用的许多类方法使用 subprocess 模块调用系统函数以获取系统统计信息:

A number of class methods that are called as part of doChecks use the subprocess module to call system functions in order to get system statistics:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]

在整个脚本因以下错误而崩溃之前运行了一段时间:

This runs fine for a period of time before the entire script crashing with the following error:

File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory

一旦脚本崩溃,服务器上 free -m 的输出是:

The output of free -m on the server once the script has crashed is:

$ free -m
                  total       used       free     shared     buffers    cached
Mem:                894        345        549          0          0          0
-/+ buffers/cache:  345        549
Swap:                 0          0          0

服务器正在运行 CentOS 5.3.我无法在我自己的 CentOS 机器上进行复制,也无法在任何其他用户报告相同问题的情况下进行复制.

The server is running CentOS 5.3. I am unable to reproduce on my own CentOS boxes nor with any other user reporting the same problem.

按照原始问题中的建议,我尝试了许多方法来调试此问题:

I have tried a number of things to debug this as suggested in the original question:

  1. 在 Popen 调用之前和之后记录 free -m 的输出.内存使用没有显着变化,即内存不会随着脚本运行逐渐用完.

  1. Logging the output of free -m before and after the Popen call. There is no significant change in memory usage i.e. memory is not gradually being used up as the script runs.

我在 Popen 调用中添加了 close_fds=True 但这没有任何区别 - 脚本仍然因相同的错误而崩溃.建议在此处这里.

I added close_fds=True to the Popen call but this made no difference - the script still crashed with the same error. Suggested here and here.

我按照建议检查了 RLIMIT_DATA 和 RLIMIT_AS 上都显示 (-1, -1) 的 rlimits 此处.

I checked the rlimits which showed (-1, -1) on both RLIMIT_DATA and RLIMIT_AS as suggested here.

一篇文章表明没有交换空间可能是原因,但交换实际上是按需提供的(根据网络主机),这也被认为是一个虚假的原因此处.

An article suggested the having no swap space might be the cause but swap is actually available on demand (according to the web host) and this was also suggested as a bogus cause here.

进程正在关闭,因为这是使用 .communicate() 的行为,由 Python 源代码和注释支持 此处.

The processes are being closed because that is the behaviour of using .communicate() as backed up by the Python source code and comments here.

可以在 GitHub 上找到完整的检查使用第 442 行定义的 getProcesses 函数.这由 doChecks() 从第 520 行开始调用.

The entire checks can be found at on GitHub here with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.

该脚本在崩溃前使用 strace 运行,输出如下:

The script was run with strace with the following output before the crash:

recv(4, "Total Accesses: 516662
Total kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4)                                = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5])                            = 0
pipe([6, 7])                            = 0
fcntl64(7, F_GETFD)                     = 0
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File "/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File "/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File "/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File "/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, """"A generally useful event sche"..., 4096) = 4054
write(2, "    ", 4)                     = 4
write(2, "void = action(*argument)
", 25) = 25
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File "/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File "/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File "/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:
        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0
try"..., 4096) = 4096
read(8, " p2cread
        # c2pread    <-"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "errread, errwrite)
", 19)    = 19
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File "/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:
        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0
try"..., 4096) = 4096
read(8, " p2cread
        # c2pread    <-"..., 4096) = 4096
read(8, "table(self, handle):
           "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None
         "..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "self.pid = os.fork()
", 21)  = 21
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
write(2, "OSError", 7)                  = 7
write(2, ": ", 2)                       = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "
", 1)                       = 1
unlink("/var/run/sd-agent.pid")         = 0
close(3)                                = 0
munmap(0xb7e0d000, 4096)                = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000)                          = 0xa022000
exit_group(1)                           = ?

推荐答案

作为一般规则(即在 vanilla 内核中),fork/clone 使用 失败ENOMEM 具体发生 因为任一老实说内存不足的情况 (dup_mm, dup_task_struct, alloc_pid, mpol_dupmm_init 等等.croak),或者因为 security_vm_enough_memory_mm 失败了你强制执行过度使用政策.

As a general rule (i.e. in vanilla kernels), fork/clone failures with ENOMEM occur specifically because of either an honest to God out-of-memory condition (dup_mm, dup_task_struct, alloc_pid, mpol_dup, mm_init etc. croak), or because security_vm_enough_memory_mm failed you while enforcing the overcommit policy.

首先检查分叉失败进程的 vmsize,在分叉尝试时,然后与空闲内存量(物理和交换)进行比较,因为它与过量使用策略相关(插入数字.)

Start by checking the vmsize of the process that failed to fork, at the time of the fork attempt, and then compare to the amount of free memory (physical and swap) as it relates to the overcommit policy (plug the numbers in.)

在您的特定情况下,请注意 Virtuozzo 具有 额外检查 过度执行.此外,我不确定您真正拥有多少控制权,从内部您的容器,对交换和过量使用配置(为了影响执行的结果.)

In your particular case, note that Virtuozzo has additional checks in overcommit enforcement. Moreover, I'm not sure how much control you truly have, from within your container, over swap and overcommit configuration (in order to influence the outcome of the enforcement.)

现在,为了真正向前迈进,我想说您还有两个选择:

Now, in order to actually move forward I'd say you're left with two options:

  • 切换到更大的实例,或
  • 投入一些编码工作以更有效地控制脚本的内存占用空间

注意,如果结果证明不是您,而是其他人在您运行 amock 时在同一服务器上的不同实例中并置,那么编码工作可能一无所获.

NOTE that the coding effort may be all for naught if it turns out that it's not you, but some other guy collocated in a different instance on the same server as you running amock.

内存方面,我们已经知道 subprocess.Popen 使用 fork/clone 幕后,这意味着每次你调用它时你都是再次请求与 Python 已经耗尽的内存一样多的内存,即数百个额外的 MB,所有这些都是为了 exec 一个微不足道的 10kB 可执行文件,例如 freeps.在不利的过度使用策略的情况下,您很快就会看到 ENOMEM.

Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM.

没有这个父页表等复制问题的fork的替代方案是vforkposix_spawn.但是,如果您不想根据 vfork/posix_spawn 重写 subprocess.Popen 的块,请考虑使用 suprocess.Popen 仅在脚本开头一次(当 Python 的内存占用最少时),生成一个 shell 脚本,然后运行 ​​free/ps/sleep 以及与您的脚本并行的循环中的任何其他内容;轮询脚本的输出或同步读取它,如果您有其他需要异步处理的事情,可能从一个单独的线程中读取它——在 Python 中处理您的数据,但将分叉留给从属进程.

Alternatives to fork that do not have this parent page tables etc. copy problem are vfork and posix_spawn. But if you do not feel like rewriting chunks of subprocess.Popen in terms of vfork/posix_spawn, consider using suprocess.Popen only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free/ps/sleep and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.

但是,在您的特定情况下,您可以完全跳过调用 psfree;您可以直接从 procfs,无论您选择自己访问还是通过 现有的库和/或包.如果 psfree 是您运行的唯一实用程序,那么您可以完全取消 subprocess.Popen.

HOWEVER, in your particular case you can skip invoking ps and free altogether; that information is readily available to you in Python directly from procfs, whether you choose to access it yourself or via existing libraries and/or packages. If ps and free were the only utilities you were running, then you can do away with subprocess.Popen completely.

最后,就 subprocess.Popen 而言,无论您做什么,如果您的脚本泄漏内存,您最终还是会碰壁.密切关注它,并检查内存泄漏.

Finally, whatever you do as far as subprocess.Popen is concerned, if your script leaks memory you will still hit the wall eventually. Keep an eye on it, and check for memory leaks.

这篇关于Python subprocess.Popen "OSError: [Errno 12] 无法分配内存";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆