如何在c ++中编写指令缓存友好程序? [英] How to write instruction cache friendly program in c++?

查看:303
本文介绍了如何在c ++中编写指令缓存友好程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近Herb Sutter在现代C ++:你需要什么知道。这个演讲的主题是效率,以及数据本地化和访问内存的重要性。
他还解释了CPU如何对存储器(数组/向量)的线性访问。他从这个主题的另一个经典参考资料Bob Nystrom的游戏演出中获得了一个例子。 / p>

阅读这些文章后,我发现有两种缓存会影响程序的性能:


  1. 数据缓存

  2. 指令缓存

Cachegrind 工具还可以测量我们程序的缓存类型检测信息。第一点已经被许多文章/博客解释,以及如何实现良好的数据高速缓存效率(数据位置)。



但是我没有得到关于主题指令缓存和我们应该在我们的程序中采取什么样的事情来实现更好的性能?根据我的理解,我们(程序员)没有太多的控制,哪个指令或什么顺序将执行。



如果小的c ++程序解释这个计数器(.i.e指令缓存)会如何随我们的写程序风格而变化,这将是非常好的。



我的意思是我们可以理解数据缓存的主题,如果我们的程序(vector vs list )以类似的方式可以解释第二点。

解决方案

任何改变执行流程的代码都会影响指令缓存。这包括函数调用和循环以及取消引用函数指针。



当执行分支或跳转指令时,处理器必须花费额外的时间来确定代码是否已经在指令高速缓存中或者是否需要重新加载指令高速缓存(从分支的目的地)。



例如,一些处理器可能有足够大的指令高速缓存来保存小循环的执行代码。一些处理器没有大的指令高速缓存和简单的重新加载它。重新加载指令高速缓存需要花费执行指令的时间。



搜索以下主题:




  • 循环展开

  • 条件指令执行(在ARM处理器上可用)

  • 内联函数

  • 指令管道



编辑1:提高性能的编程技术

要提高性能并减少指令缓存重新载入,请执行以下操作:



减少if语句
设计代码以最小化if语句。这可能包括布尔代数,使用更多的数学或简化比较(他们真的需要吗?)。希望减少then和else子句的内容,以便编译器可以使用条件汇编语言指令。



将小函数定义为内联或宏

有一个与调用函数相关的开销,例如存储返回位置并重新加载指令高速缓存。对于具有少量语句的函数,请尝试向编译器建议它们是内联的。内联意味着粘贴代码的执行内容,而不是进行函数调用。由于避免了函数调用,因此需要重新加载指令高速缓存。



展开循环

对于小的迭代,不循环,但重复循环的内容可以在更高的优化级别设置执行此操作)。重复的内容越多,到循环顶部的分支数就越少,并且不需要重新加载指令高速缓存。



使用表查询,而不是if语句

某些程序使用if-else-if将数据映射到值。每个if语句是指令高速缓存中的执行中断。有时,用一个小数学,值可以放置在一个表如数组和索引计算。一旦索引已知,处理器可以检索数据而不中断指令高速缓存。



更改数据或数据结构

如果数据类型是常量,可以围绕数据优化程序。例如,处理消息包的程序可以基于包ID(基于函数指针的数组)使其操作。函数将针对分组处理进行优化。



将链接列表更改为数组或其他随机访问容器。可以使用数学访问数组的元素,而不是中断执行。链接列表必须遍历(循环)才能找到一个项目。


Recently Herb Sutter gave a great talk on "Modern C++: What You Need to Know". The main theme of this talk was efficiency and how data locality and accessing the memory matters. He has also explained how linear access of memory(array/vector) would be loved by CPU. He has taken one example from another classical reference "Game performance by Bob Nystrom" on this topic.

After reading these articles, I got that there is two type of cache which impact the program performance:

  1. Data Cache
  2. Instruction Cache

Cachegrind tool also measures both cache type instrumentation information of our program. The first points has been explained by many article/blog and how to achieve the good data cache efficiency(data locality).

However I did not get much information on topic Instruction Cache and what sort of thing we should take care in our program to achieve the better performance?. As per my understanding, we(programmer) do not have much control on which instruction or what order would be executing.

It would be really nice if small c++ programs explains how this counter(.i.e instruction cache) would vary with our style of writing program. What are the best practice programmer should follow to achieve better performance with respect to this point?

I mean we can understand about data cache topics if our program does(vector vs list) in similar way does it possible to explain about 2nd point. The main intention of this question is to understand this topic as much as possible.

解决方案

Any code that changes the flow of execution affects the Instruction Cache. This includes function calls and loops as well as dereferencing function pointers.

When a branch or jump instruction is executed, the processor has to spend extra time deciding if the code is already in the instruction cache or whether it needs to reload the instruction cache (from the destination of the branch).

For example, some processors may have a large enough instruction cache to hold the execution code for small loops. Some processors don't have a large instruction cache and simple reload it. Reloading of the instruction cache takes time that could be spent executing instructions.

Search these topics:

  • Loop unrolling
  • Conditional instruction execution (available on ARM processors)
  • Inline functions
  • Instruction pipeline

Edit 1: Programming techniques for better performance
To improve performance and reduce the instruction cache reloading do the following:

Reduce "if" statements Design your code to minimize "if" statements. This may include Boolean Algebra, using more math or simplifying comparisons (are they really needed?). Prefer to reduce the content of "then" and "else" clauses so that the compiler can use conditional assembly language instructions.

Define small functions as inline or macros
There is an overhead associated with calling functions, such as storing the return location and reloading the instruction cache. For functions with a small amount of statements, try suggesting to the compiler that they be made inline. Inlining means to paste the contents of the code where the execution is, rather than making a function call. Since the function call is avoided, so is the need to reload the instruction cache.

Unroll loops
For small iterations, don't loop, but repeat the content of the loop (some compilers may do this at higher optimization level settings). The more content repeated, the less number of branches to the top of the loop and less need to reload the instruction cache.

Use table lookups, not "if" statements
Some programs use "if-else-if" ladders for mapping data to values. Each "if" statement is a break in the execution in the instruction cache. Sometimes, with a little math, the values can be placed in a table like an array and the index calculated mathematically. Once the index is known, the processor can retrieve the data without disrupting the instruction cache.

Change data or data structures
If the type of data is constant, a program can be optimized around the data. For example, a program handling message packets could base its operations based on the packet IDs (think array of function pointers). Functions would be optimized for packet processing.

Change linked lists to arrays or other random-access container. Elements of an array can be accessed using math and not interrupt execution. Linked lists must be traversed (loop) to find an item.

这篇关于如何在c ++中编写指令缓存友好程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆