重新启动进程时未收回内存 [英] Memory not getting reclaimed on restart of a process
问题描述
我有一个python作业,可运行caffe网络以在nvidia GPU上进行图像处理。作业从Rabbitmq队列中获取图像,对其进行处理,然后将结果写入另一个队列。当我重新启动该作业时,进程将被终止,但内存未被回收。
I have a python job that runs a caffe net for image processing on nvidia GPUs. The job takes images from a rabbitmq queue, processes it and then writes the result in another queue. When I restart this job, the processes are getting killed but memory is not getting reclaimed.
因此,在经过一定数量的重新启动后,计算机崩溃了。一旦我杀死了工作,就不会在ps或top中运行任何python进程,但是不会回收CPU内存。
So after certain number of restarts the machine crashes. Once I kill the job there is no python process running in ps or top but the CPU memory is not getting reclaimed.
如何调试此问题?
编辑:CPU内存
推荐答案
这是您的GPU内存无法释放的原因。获取进程ID
It's your GPU memory which is not getting freed. Get the process id
$ nvidia-smi
,然后
$ kill -9 <process id>
这篇关于重新启动进程时未收回内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!