向量化嵌套的if语句 [英] Vectorizing nested if-statements

查看:109
本文介绍了向量化嵌套的if语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题 目前,我正在处理通过不同过程运行的大约1800万个点数据集.在概要文件查看器上,我发现我的瓶颈之一是代码的这一部分,因此我想知道是否可以向量化多个if语句.

Problem I am processing roughly 18 million point data sets at the moment that run through different processes. Over the profile viewer I found out that one of my bottlenecks is this part of code and hence I was wondering if it is possible to vectorize multiple if-statements.

代码

WA=zeros(size(NB_list_z,1),3);
for i=1:size(NB_list_z,1);
    if (NB_list_z(i,2)==0||NB_list_z(i,3)==0);        
    WA(i,1)=BMLS(NB_list_z(i,1),5);

    else
       if (BMLS(NB_list_z(i,3),5)>=COG);
       WA(i,1)=(BMLS(NB_list_z(i,3),5)+BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/3;    
            if (WA(i,1)<COG);
                if (BMLS(NB_list_z(i,2),5)>=COG);
               WA(i)=(BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/2;                           
                    if (WA(i,1)<COG);
                  WA(i,1)=BMLS(NB_list_z(i,1),5);                  
                    end 
                else
                WA(i,1)=BMLS(NB_list_z(i,1),5);                
                end 
            end 
        else
             if (BMLS(NB_list_z(i,2),5)>=COG);
               WA(i,1)=(BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/2;

                  if (WA(i,1)<COG);
                  WA(i,1)=BMLS(NB_list_z(i,1),5);                 
                 end 
             else
                WA(i,1)=BMLS(NB_list_z(i,1),5);               
             end          
       end
    end  
end 

代码说明

NB_list_z包含第一列中各点的邻居的索引(在z方向上-每个点最多可以有两个以上的点.) BMLS包含我要检查的阈值. COG是阈值. 考虑最低的块= Block1,上方的一个= Block2,上方的一个作为Block3.

NB_list_z contains the indexes of the neighbors of the points in the first column (in z direction -every point can have up to two points above.) BMLS contains the values for the threshold I want to check. COG is the threshold value. Consider the lowest block = Block1 , the one above = Block2 and the one over as Block3.

如果上面没有邻居,则第一个if子句将该值设置为Block1的值.

The first if-clause sets the value to the value of Block1 if there are no neighbors existing above.

之后,我想以对我最有利的方式组合区块. 这意味着,如果块3 + 2 + 1高于阈值,我想将它们全部包括在内,但最高块(此处为块3)也必须始终单独超过阈值. 如果不是,则在相同条件下为2 + 1,否则为1. 上面的代码在小型数据集上工作得很好,但是在大型数据集上开始要花费很多时间.

After that I want to combine blocks in the most profitable way for me. Meaning that if Blocks 3+2+1 are above the threshold I want to include them all, but the highest block (here block 3) always has to be over the threshold alone as well. If not then 2+1 with the same conditions and if not then only 1. The code above works perfectly fine on small data sets but starts to take a lot of time for bigger data sets.

问题

从某种意义上说,我是代码优化"和向量化"的新手,我只是从头开始.我找到了一些有关删除for循环的条目,但找不到任何要删除的内容,或者找不到多个if子句.因此,问题是有可能对嵌套的if子句进行矢量化处理吗?

I am new to "code optimization" and "vectorization" in that sense that I only started with it. I found some entries about removing for-loops and the like but I couldn't find anything to remove or simply multiple if-clauses. Hence the question is it possible to vectorize nested if-clauses ?

推荐答案

在此论坛上重写代码的时间太长,并且在没有针对预期输出的测试数据进行测试的情况下,重写时间肯定也太长.但是,让我写一些有关向量化"的内容.

The code is a bit too long to rewrite on this forum and definitely too long too rewrite without testing in against test data with expected output. However, let me write a bit about "vectorization" instead.

什么是向量化?

因此,当我们谈论MATLAB中的向量化时,通常是指对向量而不是向量中的每个元素应用某些操作.有点过分简化了,我们可以看到它好像我们在使用向量作为输入的函数,而不是对向量中的每个元素进行操作.为了使此方法有效,该操作需要MATLAB的支持.我的意思是,繁重的工作应该由编译文件(mex文件)执行.

So, when we talk about vectorization in MATLAB we commonly mean that we apply certain operations on a vector instead of each element in the vector. A bit oversimplyfied we can see it as if we, instead of applying an operation on each element in the vector, using function taking a vector as input instead. For this to be effective, the operation need to have MATLAB support. What I mean is that the heavy work should be performed by a compiled file (mex-file).

如何完成?

当您要将其应用于向量中的所有元素时,这确实很简单.例如,

When you want to apply this to all elements in a vector, it is really simple. For example, instead of doing,

a = 1:2:20;
total = 0;
for k = a %(range-based)
    total = total + a;
end
%for ind = 1:length(a) %(same result)
%    total = total + a(ind);
%end

可以这样做,

a = 1:2:20;
total = sum(a);

如果循环中有一个if语句,仍然可以向量化它.假设您要对所有小于11和大于11的元素分别求和,

In case there you have an if statement in the loop it is still possible to vectorize this. Assume you want to sum all elements smaller than 11 and larger than 11 separately,

a = 1:2:20
total1 = sum(a(a<11));
total2 = sum(a(a>11));

但是,如果您嵌套了if语句,它将变得更加复杂.您可能需要将操作拆分为多个表达式. if语句的每个分支都需要单独处理.每个嵌套的if语句需求将被视为外部if语句的子集.因此,可以使用and(&)来处理它.

However, in case you have nested if statements it gets more complicated. You will likely need to split the operation in a number of expressions. Each branch of the if statement needs to be handled separately. Each nested if statement need will be seen as a subset of the outer if statement. Thus it can be handled using and (&).

b = rand(10);
c = zeros(10);
c(b<0.5) = 0;
c(b>=0.5 & b<0.8) = 2*(c(b>=0.5 & b<0.8).^2);
c(b>=0.8) = 1;

何时进行矢量化

如果函数仅使用几次并足够快地"执行,则可能不值得向量化.此后,就成为了复杂性和效率之间的折衷方案.如果在执行过程中被调用10000次,则在百分之一秒执行的函数可能仍需要优化.通常,更通用的函数需要优化,因为这些函数似乎吸引了更多的函数调用.同样,如果在循环之间存在依赖关系的情况下运行嵌套循环,则这些函数往往难以向量化.

It may not be worth vectorizing if a function is used only a few times and execute "sufficiently fast". After this it becomes a trade-off between complexity and efficiency. A function executing in a 100th of a second may still need optimizing if it is called 10000 times during an execution. Normally the more general functions needs optimization since these seems to attract a higher number of function calls. Also in case you run nested for loops where there is a dependency between the loops, these functions tend to be hard to vectorize.

a = 2:2:20;
for (m=1:length(a))
    for (n=1:length(a))
        if (m~=n)
            a(n) = a(n)/2; 
        end
    end
    a(a>5) = 2*a(a>5);
end

这变得非常复杂,其中内部循环取决于外部循环的特定迭代.也许仍然有可能解决,但是您将遇到与找到正交积分对双积分相似的问题.在并非绝对必要的情况下,可能不值得付出努力,即使对其进行矢量化至关重要,与对这些循环进行矢量化处理相比,仍然有可能需要以一种更加可矢量化的方式重新定义问题.

This becomes quite complicated, where the inner loop depends on the specific iteration of the outer loop. It may still be possible to solve, but you will have a problem similar to finding an orthogonal parametrization to a double integral. In case it is not absolutely necessary it may not be worth the effort and even if it is crucial to vectorize this it may still be worth to redefine the problems in terms of a more vectorizable manner than vectorizing these loops.

一些遗言

请注意,对于大型数据集,矢量化可能会生成大量元素的副本.由于Matlab使用写时复制功能,因此请确保您未修改函数的输入.

Note that for large data sets a vectorization may generate copies of a large number of elements. Make sure that you are not modifing the input to a function since Matlab uses copy-on-write.

这篇关于向量化嵌套的if语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆