有没有办法在 3D 数组中的元素上优化/矢量化这些循环,而无需显着增加内存? [英] Is there a way to optimize/vectorize these loops over elements in a 3D array, without requiring significantly more memory?

查看:28
本文介绍了有没有办法在 3D 数组中的元素上优化/矢量化这些循环,而无需显着增加内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一些帮助来加速一些 Matlab 代码.

I am looking for some help to speed up some Matlab code.

下面显示了一个最小示例.该代码正在对 [x,y,z] 坐标上定义的 3D 矩阵进行一些计算.使用探查器时,ind 上的内部循环是耗时的部分,所以我想知道这个循环是否可以优化,或者完全删除/矢量化.

A minimal example is shown below. The code is doing some calculations on a 3D matrix, defined on [x,y,z] coordinates. When using the profiler, the inner loop over ind is the time-consuming part, and so I am wondering if this loop can either be optimized, or removed/vectorized completely.

Nx = 8; % Number of grid points
Ny = 6;
Nz = 4;
Ntot = Nx*Ny*Nz;

xvals = rand(1,Nx); % Create grid vectors
yvals = rand(1,Ny);
zvals = rand(1,Nz);

input_vec = rand(Ny,Nx,Nz); % Generate a dummy 3D matrix ( meshgrid convention, [y,x,z] )
input_vec = reshape( permute(input_vec,[3,1,2]) , [Ntot 1]); % Unwrap to 1D, so z cycles fastest, then y, then x

C1 = 5; % Loop counters
C2 = 6;
C3 = 7;

output_vec = zeros(Ntot,1); % Preallocate
temp_vec = zeros(Ntot,1);

for cnt1 = 1:C1
    for cnt2 = 1:C2
        for cnt3 = 1:C3
            
            factor1 = xvals*cnt1; % Calculate some vectors which depend on cnt variables
            factor2 = yvals*cnt2;
            factor3 = zvals*cnt3;
            
            for ind = 1:Ntot % Loop over every grid point
                j1 = floor( floor((ind-1) / Nz) / Ny) + 1; % +1 and -1 's account for Matlab's [1] indexing
                j2 = mod( floor((ind-1)/Nz) , Ny ) + 1;
                j3 = mod( (ind-1), Nz ) + 1;
                temp_vec(ind) = input_vec(ind) * factor1(j1)*factor2(j2)*factor3(j3);
            end
            output_vec = output_vec + temp_vec;
        end
    end
end

在我的实际应用中,点数更像是 1024x1024x512,因此我尽量避免使用大量 meshgrid 格式的变量(其中包含大量重复信息)以保持内存需求下降 - 这就是上面代码中 3D 数组已解包为 1D 的原因.例如,一种解决方案可能是像这样预先计算所有 j1,j2,j3

In my real application, the number of points is more like 1024x1024x512, and so I have tried to avoid using lots of meshgrid formatted variables (which contain a lot of repeated information) in order to keep the memory requirements down - this is the reason that the 3D array has been unwrapped to 1D in the code above. For example, one solution might be to precalculate all the j1,j2,j3 values like so

j1 = 1:Nx;
j2 = 1:Ny;
j3 = 1:Nz;
[J1,J2,J3] = meshgrid(j1,j2,j3);
J1 = reshape( permute(J1,[3,1,2]) , [Ntot 1]); % Unwrap to 1D, so z cycles fastest, then y, then x 
J2 = reshape( permute(J2,[3,1,2]) , [Ntot 1]); 
J3 = reshape( permute(J3,[3,1,2]) , [Ntot 1]); 

但这比每次根据 ind 的值计算单个 j 值需要更多的 RAM.

but this requires much more RAM than calculating a single value of j each time depending on the value of ind.

任何人都可以帮助提供更好/更快(但仍然有效内存)的方法来做到这一点吗?谢谢.

Can anyone help with a better/faster (but still memory efficient) way to do this? Thank you.

推荐答案

以下内容:

factor1 = xvals*cnt1; % Calculate some vectors which depend on cnt variables
factor2 = yvals*cnt2;
factor3 = zvals*cnt3;
for ind = 1:Ntot % Loop over every grid point
   j1 = floor( floor((ind-1) / Nz) / Ny) + 1; % +1 and -1 's account for Matlab's [1] indexing
   j2 = mod( floor((ind-1)/Nz) , Ny ) + 1;
   j3 = mod( (ind-1), Nz ) + 1;
   temp_vec(ind) = input_vec(ind) * factor1(j1)*factor2(j2)*factor3(j3);
end

可以写成(未测试):

factor1 = xvals*cnt1; % Calculate some vectors which depend on cnt variables
factor2 = yvals*cnt2;
factor3 = zvals*cnt3;
ind = 1:Ntot;
j1 = floor( floor((ind-1) / Nz) / Ny) + 1; % +1 and -1 's account for Matlab's [1] indexing
j2 = mod( floor((ind-1)/Nz) , Ny ) + 1;
j3 = mod( (ind-1), Nz ) + 1;
temp_vec = input_vec(:).' .* factor1(j1).*factor2(j2).*factor3(j3);

(特别是不使用 ind 进行索引可能会产生很大的不同,尽管我认为这是在最新版本的 MATLAB 中已优化的特殊情况.)

(Especially not indexing with ind could make a big difference, though I think this is a special case that has been optimized in recent version of MATLAB.)

但我们仍在创建大型中间数组.您应该能够简化(同样,未经测试,并且可能有问题):

But there we're still creating large intermediate arrays. You should be able to simplify with (again, not tested, and likely buggy):

factor1 = xvals*cnt1;                  % horizontal array
factor2 = (yvals*cnt2).';              % vertical array
factor3 = permute(zvals*cnt3,[1,3,2]); % array along 3rd dimension
temp_vec = input_vec .* factor1 .* factor2 .* factor3;

这篇关于有没有办法在 3D 数组中的元素上优化/矢量化这些循环,而无需显着增加内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆