Python子进程.Popen"OSError:[Errno 12]无法分配内存". [英] Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory"

查看:74
本文介绍了Python子进程.Popen"OSError:[Errno 12]无法分配内存".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:最初是

Note: This question was originally asked here but the bounty time expired even though an acceptable answer was not actually found. I am re-asking this question including all details provided in the original question.

python脚本使用 sched 每60秒运行一组类函数模块:

A python script is running a set of class functions every 60 seconds using the sched module:

# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))

该脚本使用代码此处作为守护进程运行.

The script is running as a daemonised process using the code here.

在doChecks中调用的许多类方法都使用 subprocess 模块调用系统功能以获得系统统计信息:

A number of class methods that are called as part of doChecks use the subprocess module to call system functions in order to get system statistics:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]

在整个脚本因以下错误崩溃之前的一段时间内运行良好:

This runs fine for a period of time before the entire script crashing with the following error:

File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory

脚本崩溃后,服务器上free -m的输出为:

The output of free -m on the server once the script has crashed is:

$ free -m
                  total       used       free     shared     buffers    cached
Mem:                894        345        549          0          0          0
-/+ buffers/cache:  345        549
Swap:                 0          0          0

服务器正在运行CentOS 5.3.我无法在自己的CentOS盒子上或任何其他报告相同问题的用户上进行复制.

The server is running CentOS 5.3. I am unable to reproduce on my own CentOS boxes nor with any other user reporting the same problem.

按照原始问题的建议,我尝试了许多方法来调试此问题:

I have tried a number of things to debug this as suggested in the original question:

  1. 在Popen调用之前和之后记录free -m的输出.内存使用没有明显变化,即脚本运行时内存不会逐渐消耗完.

  1. Logging the output of free -m before and after the Popen call. There is no significant change in memory usage i.e. memory is not gradually being used up as the script runs.

我在Popen调用中添加了close_fds = True,但这没有什么不同-脚本仍然因相同的错误而崩溃.建议在此处此处 .

I added close_fds=True to the Popen call but this made no difference - the script still crashed with the same error. Suggested here and here.

我按照建议 此处.

An article suggested the having no swap space might be the cause but swap is actually available on demand (according to the web host) and this was also suggested as a bogus cause here.

进程已关闭,因为这是使用.communicate()的行为,该行为由Python源代码和注释支持

The processes are being closed because that is the behaviour of using .communicate() as backed up by the Python source code and comments here.

可以在 GitHub此处上找到完整的支票使用第442行定义的getProcesses函数.从第520行开始,由doChecks()调用.

The entire checks can be found at on GitHub here with the getProcesses function defined from line 442. This is called by doChecks() starting at line 520.

该脚本在崩溃前与strace一起运行并具有以下输出:

The script was run with strace with the following output before the crash:

recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4)                                = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5])                            = 0
pipe([6, 7])                            = 0
fcntl64(7, F_GETFD)                     = 0
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, "    ", 4)                     = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "  File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "errread, errwrite)\n", 19)    = 19
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
read(8, "table(self, handle):\n           "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n         "..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "self.pid = os.fork()\n", 21)  = 21
close(8)                                = 0
munmap(0xb7d28000, 4096)                = 0
write(2, "OSError", 7)                  = 7
write(2, ": ", 2)                       = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1)                       = 1
unlink("/var/run/sd-agent.pid")         = 0
close(3)                                = 0
munmap(0xb7e0d000, 4096)                = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000)                          = 0xa022000
exit_group(1)                           = ?

推荐答案

作为一般规则(例如,在普通内核中),fork/clone失败,并带有ENOMEM

As a general rule (i.e. in vanilla kernels), fork/clone failures with ENOMEM occur specifically because of either an honest to God out-of-memory condition (dup_mm, dup_task_struct, alloc_pid, mpol_dup, mm_init etc. croak), or because security_vm_enough_memory_mm failed you while enforcing the overcommit policy.

首先,在进行分叉尝试时检查未能分叉的进程的vmsize,然后将其与过量使用策略相关的可用内存量(物理和交换)进行比较(将数字插入)

Start by checking the vmsize of the process that failed to fork, at the time of the fork attempt, and then compare to the amount of free memory (physical and swap) as it relates to the overcommit policy (plug the numbers in.)

在您的特定情况下,请注意Virtuozzo具有其他检查 .c"rel =" noreferrer>过量提交实施.而且,我不确定从容器内对

In your particular case, note that Virtuozzo has additional checks in overcommit enforcement. Moreover, I'm not sure how much control you truly have, from within your container, over swap and overcommit configuration (in order to influence the outcome of the enforcement.)

现在,为了真正前进,我会说你有两个选择:

Now, in order to actually move forward I'd say you're left with two options:

  • 切换到更大的实例,或者
  • 投入一些编码工作来更有效地控制脚本的内存占用空间
  • switch to a larger instance, or
  • put some coding effort into more effectively controlling your script's memory footprint

注意,如果事实并非是您,而是其他人在运行amock的同一台服务器上的其他实例中并置在一起,那么编码工作可能就一事无成.

NOTE that the coding effort may be all for naught if it turns out that it's not you, but some other guy collocated in a different instance on the same server as you running amock.

在内存方面,我们已经知道 subprocess.Popen使用fork/clone

Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM.

没有该父页面表等的fork替代品.复制问题为 posix_spawn .但是,如果您不想用vfork/posix_spawn来重写subprocess.Popen的块,请考虑在脚本开始时(Python的内存占用最小),仅使用suprocess.Popen一次,以 >产生一个shell脚本,然后在与您的脚本并行的循环中运行free/ps/sleep和其他任何内容;轮询脚本的输出或同步读取它,如果您还有其他需要异步处理的内容,则可能从一个单独的线程中读取它-使用Python处理数据,但将分支留给下属进程.

Alternatives to fork that do not have this parent page tables etc. copy problem are vfork and posix_spawn. But if you do not feel like rewriting chunks of subprocess.Popen in terms of vfork/posix_spawn, consider using suprocess.Popen only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free/ps/sleep and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.

如何,在您的特定情况下,您可以完全跳过调用psfree的操作;直接从 procfs ,无论您选择自己访问还是通过现有库和/或包.如果psfree是您正在运行的唯一实用程序,则可以完全放弃subprocess.Popen .

HOWEVER, in your particular case you can skip invoking ps and free altogether; that information is readily available to you in Python directly from procfs, whether you choose to access it yourself or via existing libraries and/or packages. If ps and free were the only utilities you were running, then you can do away with subprocess.Popen completely.

最后,就subprocess.Popen而言,无论您做什么,如果脚本泄漏内存,最终您仍然会陷入困境.请密切注意,并检查内存泄漏.

Finally, whatever you do as far as subprocess.Popen is concerned, if your script leaks memory you will still hit the wall eventually. Keep an eye on it, and check for memory leaks.

这篇关于Python子进程.Popen"OSError:[Errno 12]无法分配内存".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆