什么时候,如果有的话,循环展开仍然有用? [英] When, if ever, is loop unrolling still useful?

查看:23
本文介绍了什么时候,如果有的话,循环展开仍然有用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试通过循环展开来优化一些对性能极为关键的代码(一种在蒙特卡罗模拟中被调用数百万次的快速排序算法).这是我试图加速的内部循环:

I've been trying to optimize some extremely performance-critical code (a quick sort algorithm that's being called millions and millions of times inside a monte carlo simulation) by loop unrolling. Here's the inner loop I'm trying to speed up:

// Search for elements to swap.
while(myArray[++index1] < pivot) {}
while(pivot < myArray[--index2]) {}

我尝试展开到以下内容:

I tried unrolling to something like:

while(true) {
    if(myArray[++index1] < pivot) break;
    if(myArray[++index1] < pivot) break;
    // More unrolling
}


while(true) {
    if(pivot < myArray[--index2]) break;
    if(pivot < myArray[--index2]) break;
    // More unrolling
}

这完全没有区别,所以我把它改回了更易读的形式.其他时候我也有过类似的经历,我尝试过循环展开.鉴于现代硬件上分支预测器的质量,循环展开何时仍然是有用的优化?

This made absolutely no difference so I changed it back to the more readable form. I've had similar experiences other times I've tried loop unrolling. Given the quality of branch predictors on modern hardware, when, if ever, is loop unrolling still a useful optimization?

推荐答案

如果您可以打破依赖链,循环展开是有意义的.这使得无序或超标量 CPU 有可能更好地安排事情并因此运行得更快.

Loop unrolling makes sense if you can break dependency chains. This gives a out of order or super-scalar CPU the possibility to schedule things better and thus run faster.

一个简单的例子:

for (int i=0; i<n; i++)
{
  sum += data[i];
}

这里参数的依赖链很短.如果由于数据阵列上的缓存未命中而导致停顿,则 cpu 只能等待.

Here the dependency chain of the arguments is very short. If you get a stall because you have a cache-miss on the data-array the cpu cannot do anything but to wait.

另一方面,这段代码:

for (int i=0; i<n; i+=4)
{
  sum1 += data[i+0];
  sum2 += data[i+1];
  sum3 += data[i+2];
  sum4 += data[i+3];
}
sum = sum1 + sum2 + sum3 + sum4;

可以跑得更快.如果您在一次计算中遇到缓存未命中或其他停顿,还有其他三个不依赖停顿的依赖链.一个无序的 CPU 可以执行这些.

could run faster. If you get a cache miss or other stall in one calculation there are still three other dependency chains that don't depend on the stall. A out of order CPU can execute these.

这篇关于什么时候,如果有的话,循环展开仍然有用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆