通过高效手循环展开 [英] Efficient by-hand loop unrolling

查看:140
本文介绍了通过高效手循环展开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个C code:

I have this C code:

for (k = 0; k < n_n; k++) {
    if (k == i || k == j) continue;
    dd=q2_vect[k]-q1_vect;
    d2=dd*dd;
    if (d2<0) {
        a=1;
        break;
    }       
}  

有关编译器优化的原因(在Cell处理器SPE),我需要手动unloop这一点,所以我尝试:

For compiler optimization reasons (on the SPE of the Cell processor), I need to unloop this by hand, so I tried:

dd=q2_vect[0]-q1_vect;
d2=dd*dd;
if (d2<0)goto done;

dd=q2_vect[1]-q1_vect;
d2=dd*dd;
if (d2<0)goto done;

dd=q2_vect[2]-q1_vect;
d2=dd*dd;
if (d2<0)goto done;

.....
.....

// end
goto notdone;

done: 
ok=0;

notdone:
.....

但我不知道该如何处理。

but I do not know how to deal with the

if (k == i || k == j) continue;

和与该LOPP依赖于N_N每次运行,并用手我应该写code这么多次为最大值的事实N_N会得到。

and with the fact that the lopp depends on each run on "n_n", and by hand I should write the code so many times as the maximal value "n_n" would get.

您如何看待它可以是固定的?

How do you think it can be fixed?

推荐答案

您确定code所写是正确的?目前的code是未定义的行为,如果 DD 是一个有符号整数类型,并在条件,如果是永不满足,如果 D 2 是无符号的,或者 DD D 2 的浮点类型。它看起来像你做第一个指数比 K 等我Ĵ,其中现蕾前pression q2_vect [K] -q1_vect 溢出。

Are you sure the code as written is correct? The current code has undefined behavior if dd is a signed integer type, and the condition in the if is never satisfied if d2 is unsigned or if dd and d2 are floating point types. It looks like you're doing a broken search for the first index k other than i or j where squaring the expression q2_vect[ k]-q1_vect overflows.

对于有效地跳过 I Ĵ迭代,我反而只是看看那里的展开循环停了下来,并在 K + 1 如果 K 等于 I Ĵ。这是假设在循环的code有没有副作用/运行总计,因为写这是真的,但我希望你可能意味着为code做别的事情(如方块相加)。

As for efficiently skipping the i and j iterations, I would instead just look at where the unrolled "loop" stopped, and restart it at k+1 if k was equal to i or j. This is assuming the code in your loop has no side effects/running total, which is true as written, but I expect you might have meant for the code to do something else (like summing the squares).

最后,我的的怀疑你的愿望手动展开循环时,你甚至不似乎有工作code开始。任何好的编译器可以展开循环,对你,但往往循环展开你希望做的类型,使得性能,而更糟糕的不是更好。我想你会做的更好让你的code正常工作第一,然后测量(看着编译器生成的ASM),只有努力改善上的之后的你确定有一个问题。

Finally, I am highly skeptical of your wish to unroll the loop manually when you don't even seem to have working code to begin with. Any good compiler can unroll the loop for you, but often the type of loop unrolling you're looking to do makes performance worse rather than better. I think you'd do better getting your code to work correctly first, then measuring (and looking at the compiler-generated asm), and only trying to improve on that after you've determined there's a problem.

这篇关于通过高效手循环展开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆