parfor中的内存不足错误:杀死从属设备,而不是主设备 [英] Out-of-memory error in parfor: kill the slave, not the master

查看:366
本文介绍了parfor中的内存不足错误:杀死从属设备,而不是主设备的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

parfor中出现内存不足错误时,是否有办法杀死仅一个Matlab从站以释放一些内存,而不是终止整个脚本?

When an out-of-memory error is raised in a parfor, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?

parfor中发生内存不足错误时,默认情况是这样:脚本终止,如下面的屏幕快照所示.

Here is what happens by default when an out-of-memory error occurs in a parfor: the script terminated, as shown in the screenshot below.

我希望有一种方法可以杀死一个奴隶(即从parpool中删除一个工作程序)或停止使用它来释放尽可能多的内存:

I wish there was a way to just kill one slave (i.e. removing a worker from parpool) or stop using it to release as much memory as possible from it:

推荐答案

如果主进程内存不足,则没有机会解决此问题.对于从属服务器上的内存不足,应执行以下操作:

If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:

代码的简单思想:用丢失的数据一次又一次地重新启动parfor,直到获得所有结果.如果一次迭代失败,将写入一个标志(文件),让我们在所有第一次迭代发生时立即抛出一个错误.这样一来,我们就可以摆脱循环",而不会浪费时间浪费掉其他内存.

The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.

%Your intended iterator
iterator=1:10;
%flags which indicate what succeeded
succeeded=false(size(iterator));
%result array
result=nan(size(iterator));
FLAG='ANY_WORKER_CRASHED';
while ~all(succeeded)
    fprintf('Another try\n')
    %determine which iterations should be done
    todo=iterator(~succeeded);
    %initialize array for the remaining results
    partresult=nan(size(todo));
    %initialize flags which indicate which iterations succeeded (we can not
    %throw erros, it throws aray results)
    partsucceeded=false(size(todo));
    %flag indicates that any worker crashed. Have to use file based
    %solution, don't know a better one. #'
    delete(FLAG);
    try
    parfor falseindex=1:sum(~succeeded)
        realindex=todo(falseindex);
        try
            % The flag is used to let all other workers jump out of the
            % loop as soon as one calculation has crashed.
            if exist(FLAG,'file')
                error('some other worker crashed');
            end
            % insert your code here
            %dummy code which randomly trowsexpection
            if rand<.5
                error('hit out of memory')
            end
            partresult(falseindex)=realindex*2
            % End of user code
            partsucceeded(falseindex)=true;
            fprintf('trying to run %d and succeeded\n',realindex)
        catch ME
            % catch errors within workers to preserve work
            partresult(falseindex)=nan
            partsucceeded(falseindex)=false;
            fprintf('trying to run %d but it failed\n',realindex)
            fclose(fopen(FLAG,'w'));
        end
    end
    catch
        %reduce poolsize by 1
        newsize = matlabpool('size')-1;
        matlabpool close
        matlabpool(newsize) 
    end
    %put the result of the current iteration into the full result
    result(~succeeded)=partresult;
    succeeded(~succeeded)=partsucceeded;
end

这篇关于parfor中的内存不足错误:杀死从属设备,而不是主设备的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆