g ++,基于范围和向量化 [英] g++ , range based for and vectorization

查看:148
本文介绍了g ++,基于范围和向量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下基于C ++ 11中循环的范围

considering the following range based for loop in C++ 11

for ( T k : j )
{
  ...
}

g ++ clang ++ 可以加快编译代码的优化标志?

there are g++ or clang++ optimization flags that can speed up the compiled code ?

c>循环我只是考虑这个新的C ++ 11结构。

I'm not talking about any for cycle I'm only considering this new C++11 construct.

推荐答案

优化循环很少涉及优化实际的循环迭代代码(在这种情况下 for(T k:j)是IN循环。

Optimizing loops is very rarely about optimizing the actual loop iteration code (for ( T k : j ) in this case), but very much about optimizing what is IN the loop.

现在,由于这是 ... ,在这种情况下,不可能说如果,例如,unrolling循环将帮助或者内联声明函数(或简单地移动它们,以便编译器可以看到它们并将它们放入内联),使用自动向量化,或者在循环内使用完全不同的算法。

Now, since this is ... in this case, it's impossible to say if, for example, unrolling the loop will help, or declaring functions inline [or simply moving them so that the compiler can see them and put them inline], using auto-vectorization, or perhaps using a completely different algorithm inside the loop.

以上段落中的示例更多细节:

The examples in the paragraph above in a bit more detail:


  1. 循环 - 基本上做几个循环迭代,而不回到循环的开始。当循环内容非常小时,这是最有帮助的。有自动展开,编译器执行展开,或者您可以通过简单地做每个循环迭代中的四个项目,然后在每个循环变量更新中向前移动四个项目来手动展开代码,或者多次更新迭代器循环本身[但这当然意味着不使用基于范围的for-loop]。

  2. 内联函数 - 编译器将采用(通常很小的)函数并将它们放入循环本身,而不是调用。这节省了处理器调用代码中的另一个地方并返回所花费的时间。大多数编译器只对编译器在编译期间可见的函数执行此操作 - 因此源必须在同一源文件中,或在包含在编译的源文件中的头文件中。

  3. 自动向量化 - 使用SSE,MMX或AVX指令在一个指令中处理多个数据项(例如,一个SSE指令可以添加四个 float 值转换为另一个四个 float )。这比在同一时间对单个数据项进行操作更快(大多数时候,由于尝试合并不同数据项,然后整理计算完成时的哪些位置,因此会产生额外的复杂性,因此没有任何好处)。

  4. 选择不同的算法 - 通常有几种方法来解决特定问题。根据你想要实现什么,一个for循环[任何类型]可能不是最好的解决方案,或者循环中的代码可能使用更聪明的方式来计算/重新排列/任何 - 达到你所需要的结果。

  1. Unrolling the loop - essentially do several of the loop iterations without going back to the start of the loop. This is most helpful when the loop content is very small. There is automatic unrolling, where the compiler does the unrolling, or you can unroll the code manually, by simply doing, say, four items in each loop iteration and then stepping four items forward in each loop variable update or updating the iterator multiple times during the loop itself [but this of course means not using the range-based for-loop].
  2. Inline functions - the compiler will take (usually small) functions and place them into the loop itself, rather than having the call. This saves on the time it takes for the processor to call out to another place in the code and return back. Most compilers only do this for functions that are "visible" to the compiler during compilation - so the source has to be either in the same source file, or in a header file that is included in the source file that is compiled.
  3. Auto-vectorisation - using SSE, MMX or AVX instructions to process multiple data items in one instruction (e.g. one SSE instruction can add four float values to another four float in one instruction). This is faster than operating on a single data item at a time (most of the time, sometimes it's no benefit because of additional complications with trying to combine the different data items and then sorting out what goes where when the calculation is finished).
  4. Choose different algorithm - there are often several ways to solve a particular problem. Depending on what you are trying to achieve, a for-loop [of whatever kind] may not be the right solution in the first place, or the code inside the loop could perhaps use a more clever way to calculate/rearrange/whatever-it-does to achieve the result you need.

... 太模糊,无法说明,上述解决方案将努力提高您的代码。

But ... is far too vague to say which, if any, of the above solutions will work to improve your code.

这篇关于g ++,基于范围和向量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆