我试图用最优化的4路循环展开该C code [英] I'm trying to optimize this c code using 4-way loop unrolling

查看:162
本文介绍了我试图用最优化的4路循环展开该C code的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要做的就是采取这种C code和使用称为循环展开技术优化它,但在这种情况下,我想用四向循环展开。现在,我明白了技术,我明白我只是不知道如何将它应用到这个code概念。我需要一些额外的变量添加?我必须让每一个循环之后,或只是在所有的循环的最后一些code?这code为8x8块code处理以像素和旋转智慧90度计数器的时钟。任何帮助将大大AP preciated。谢谢。

  / *
 * rotate8 - 旋转采用8x8阻塞
 * /CHAR rotate8_descr [] =rotate8:采用8x8堵转;无效rotate8(INT暗淡,像素* SRC,像素* DST)
{INT I,J,II,JJ;用于(ⅱ= 0; II蛋白酶暗淡; II + = 8)
       对于(JJ = 0; JJ<暗淡; JJ + = 8)
              对于(I = II,I< II + 8;我++)
                  为(J = JJ; J< JJ + 8; J ++)
                      DST [RIDX(点心-1-J,I,DIM)] = SRC [RIDX(I,J,DIM)];
}


解决方案

您可以用code的8明确行替换内环

  DST [RIDX(点心-1-JJ,我,DIM)] = SRC [RIDX(I,JJ,DIM)];
          DST [RIDX(点心-1-(JJ + 1),我,DIM)] = SRC [RIDX(I,(JJ + 1),DIM)];
          ...
          DST [RIDX(点心-1-(JJ + 7),我,DIM)] = SRC [RIDX(I,(JJ + 7),DIM)];

所以你被明确写为每个需要值线代替循环变量。

现在,你可以重复,对于下一个循环的8个值,​​你将有8×8行code的,等等。

由于比理解练习以外的任何其他,这似乎是pretty毫无意义对我来说,真正的编译器有效地完成这种东西,他们会优化它是有意义的。手卷很少产生最佳code。

What I'm trying to do is take this C code and optimize it using a technique called loop unrolling, but in this case I want to use four-way loop unrolling. Now, I understand the technique and I understand the concept I just don't know how to apply it to this code. Do I have to add in some extra variables? Do I have to have some code after each loop or just at the end of all the loops? This code is 8x8 block code dealing with taking pixels and rotating it 90 degrees counter clock wise. Any help would greatly be appreciated. Thank You.

/* 
 * rotate8 - rotate with 8x8 blocking
 */

char rotate8_descr[] = "rotate8: rotate with 8x8 blocking";

void rotate8(int dim, pixel *src, pixel *dst) 
{

int i, j, ii, jj;

for(ii = 0; ii < dim; ii += 8)
       for(jj = 0; jj < dim; jj += 8)
              for (i = ii; i < ii + 8; i++)   
                  for (j = jj; j < jj + 8; j++)
                      dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)];
}

解决方案

You can replace the inner loop with 8 explicit lines of code

          dst[RIDX(dim-1-jj, i, dim)] = src[RIDX(i, jj, dim)];
          dst[RIDX(dim-1-(jj+1), i, dim)] = src[RIDX(i, (jj+1), dim)];
          ...
          dst[RIDX(dim-1-(jj+7), i, dim)] = src[RIDX(i, (jj+7), dim)];

so you are replacing the loop variable by explicitly writing a line for each value it takes.

Now you can repeat that for the 8 values of the next loop, you'll have 8 x 8 lines of code, and so on.

As anything other than an exercise in understanding, this seems pretty pointless to me, compilers do this kind of stuff really efficiently, they'll optimise where it makes sense. Hand-rolling rarely produces optimal code.

这篇关于我试图用最优化的4路循环展开该C code的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆