向工作人员发送数据 [英] Sending data to workers

查看:91
本文介绍了向工作人员发送数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一段并行代码,以加快处理非常大(两亿行)的数组的速度.为了并行处理,我将我的数据切成8个(我的内核数),并尝试向每个工作人员发送1个.但是,从我的RAM使用情况来看,似乎每件作品都发送给每个工作人员,有效地将我的RAM使用量乘以8.

I am trying to create a piece of parallel code to speed up the processing of a very large (couple of hundred million rows) array. In order to parallelise this, I chopped my data into 8 (my number of cores) pieces and tried sending each worker 1 piece. Looking at my RAM usage however, it seems each piece is send to each worker, effectively multiplying my RAM usage by 8. A minimum working example:

A = 1:16;
for ii = 1:8
    data{ii} = A(2*ii-1:2*ii);
end

现在,当我使用parfor将此数据发送给工作人员时,似乎发送的是完整的单元格,而不是仅发送所需的片段:

Now, when I send this data to workers using parfor it seems to send the full cell instead of just the desired piece:

output = cell(1,8);
parfor ii = 1:8
    output{ii} = data{ii};
end

我实际上在parfor循环中使用了某些功能,但这说明了这种情况. MATLAB实际上是否将完整的单元格data发送给每个工作人员,如果是,如何使它仅发送所需的部分?

I actually use some function within the parfor loop, but this illustrates the case. Does MATLAB actually send the full cell data to each worker, and if so, how to make it send only the desired piece?

推荐答案

根据我的个人经验,我发现使用parfeval在内存使用方面要比parfor更好.此外,您的问题似乎更容易解决,因此您可以使用parfeval向MATLAB工作人员提交更多较小的工作.

In my personal experience, I found that using parfeval is better regarding memory usage than parfor. In addition, your problem seems to be more breakable, so you can use parfeval for submitting more smaller jobs to MATLAB workers.

假设您有要处理jobCnt作业的workerCnt MATLAB工人.假设data是大小为jobCnt x 1的单元格数组,并且其每个元素都对应于函数getOutput的数据输入,该函数对数据进行分析.然后将结果存储在大小为jobCnt x 1的单元格数组output中.

Let's say that you have workerCnt MATLAB workers to which you are gonna handle jobCnt jobs. Let data be a cell array of size jobCnt x 1, and each of its elements corresponds to a data input for function getOutput which does the analysis on data. The results are then stored in cell array output of size jobCnt x 1.

在第一个for循环中分配作业,并在第二个while循环中检索结果.布尔变量doneJobs指示要完成的工作.

in the following code, jobs are assigned in the first for loop and the results are retrieved in the second while loop. The boolean variable doneJobs indicates which job is done.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    output{idx} = result;
    doneJobs(idx) = true;
end

此外,如果要节省更多的内存,可以将这种方法更进一步.您可以做的是,在获取完成的工作的结果之后,可以删除future的相应成员.原因是该对象存储了getOutput函数的所有输入和输出数据,这可能会很大.但是您需要小心,因为删除future的成员会导致索引移位.

Also, you can take this approach one step further if you want to save up more memory. What you could do is that after fetching the results of a done job, you can delete the corresponding member of future. The reason is that this object stores all the input and output data of getOutput function which probably is going to be huge. But you need to be careful, as deleting members of future results index shift.

以下是我为此汗水编写的代码.

The following is the code I wrote for this porpuse.

poolObj = parpool(workerCnt);
jobCnt = length(data); % number of jobs
output = cell(jobCnt,1);
for jobNo = 1:jobCnt
    future(jobNo) = parfeval(poolObj,@getOutput,...
        nargout('getOutput'),data{jobNo});
end
doneJobs = false(jobCnt,1);
while ~all(doneJobs)
    [idx,result] = fetchnext(future);
    furure(idx) = []; % remove the done future object
    oldIdx = 0;
    % find the index offset and correct index accordingly
    while oldIdx ~= idx
        doneJobsInIdxRange = sum(doneJobs((oldIdx + 1):idx));
        oldIdx = idx
        idx = idx + doneJobsInIdxRange;
    end
    output{idx} = result;
    doneJobs(idx) = true;
end

这篇关于向工作人员发送数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆