分支感知编程 [英] Branch-aware programming

查看:69
本文介绍了分支感知编程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读有关分支预测错误的信息,这可能是应用程序性能的一个热门瓶颈.正如我所看到的,人们经常显示出 assembly 代码来揭示问题,并指出程序员通常可以预测分支在大多数情况下可以到达的位置,并避免分支错误预测.

I'm reading around that branch misprediction can be a hot bottleneck for the performance of an application. As I can see, people often show assembly code that unveil the problem and state that programmers usually can predict where a branch could go the most of the times and avoid branch mispredictons.

我的问题是:

1-是否可以使用某些高级编程技术(即无汇编)来避免分支预测错误?

1- Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?

2-使用高级编程语言(我对C和C ++最为感兴趣)生成适合分支的代码时,我应该牢记什么?

2- What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

欢迎使用代码示例和基准测试!

Code examples and benchmarks are welcome!

推荐答案

人们经常...并指出程序员通常可以预测分支的去向

people often ... and state that programmers usually can predict where a branch could go

(*)经验丰富的程序员经常提醒人们,程序员很难预测这一点.

(*) Experienced programmers often remind that human programmers are very bad at predicting that.

1-是否可以使用某些高级编程技术(即无需汇编)来避免分支预测错误?

1- Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?

不是在标准c ++或c中.至少没有一个分支.您可以做的是最大程度地减少依赖链的深度,以使分支错误预测不会产生任何影响.现代的cpus将执行分支的两个代码路径,并删除未选择的分支.但是,这有一个局限性,这就是为什么分支预测仅在深层依赖链中才重要的原因.

Not in standard c++ or c. At least not for a single branch. What you can do is minimize the depth of your dependency chains so that branch mis-prediction would not have any effect. Modern cpus will execute both code paths of a branch and drop the one that wasn't chosen. There's a limit to this however, which is why branch prediction only matters in deep dependency chains.

某些编译器提供扩展以手动建议预测,例如 __builtin_expect 在gcc中.这是一个 stackoverflow问题.更好的是,某些编译器(例如gcc)支持对代码进行性能分析并自动检测最佳预测.由于(*),使用概要分析而不是手动工作是很聪明的.

Some compilers provide extension for suggesting the prediction manually such as __builtin_expect in gcc. Here is a stackoverflow question about it. Even better, some compilers (such as gcc) support profiling the code and automatically detect the optimal predictions. It's smart to use profiling rather than manual work because of (*).

2-使用高级编程语言(我主要对C和C ++感兴趣)生成对分支友好的代码时,我应该牢记什么?

2- What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

最重要的是,您应该记住,分支错误预测只会在程序最关键的性能部分影响您,而不必担心,直到您测量并发现问题为止.

Primarily, you should keep in mind that branch mis-prediction is only going to affect you in the most performance critical part of your program and not to worry about it until you've measured and found a problem.

但是当某些探查器(valgrind,VTune等)告诉我foo.cpp的第n行我得到分支预测惩罚时,该怎么办?

But what can I do when some profiler (valgrind, VTune, ...) tells that on line n of foo.cpp I got a branch prediction penalty?

伦丁给出了非常明智的建议

Lundin gave very sensible advice

  1. 测量是否重要.
  2. 如果有关系,那么
    • 最小化计算的依赖链的深度.如何做到这一点可能非常复杂,超出了我的专业知识,如果不进行组装,您将无能为力.您可以使用高级语言进行的操作是最大程度地减少条件检查(**)的次数.否则,您将受到编译器优化的支配.避免深度依赖链也可以更有效地使用乱序的超标量处理器.
    • 使分支机构始终可预测.效果可以在 stackoverflow中看到问题.在问题中,数组上有一个循环.循环包含一个分支.分支取决于当前元素的大小.当对数据进行排序时,使用特定的编译器进行编译并在特定的cpu上运行时,可以证明循环更快.当然,保持所有数据的排序也将花费cpu时间,可能比分支错误预测所花费的时间还多,所以 measure .
  1. Measure fo find out whether it matters.
  2. If it matters, then
    • Minimize the depth of dependency chains of your calculations. How to do that can be quite complicated and beyond my expertise and there's not much you can do without diving into assembly. What you can do in a high level language is to minimize the number of conditional checks (**). Otherwise you're at the mercy of compiler optimization. Avoiding deep dependency chains also allows more efficient use of out-of-order superscalar processors.
    • Make your branches consistently predictable. The effect of that can be seen in this stackoverflow question. In the question, there is a loop over an array. The loop contains a branch. The branch depends on size of the current element. When the data was sorted, the loop could be demonstrated to be much faster when compiled with a particular compiler and run on a particular cpu. Of course, keeping all your data sorted will also cost cpu time, possibly more than the branch mis-predictions do, so, measure.

2和3.的顺序可以切换.手动优化代码是很多工作.另一方面,对于某些程序而言,收集概要分析数据也可能很困难.

Order of 2. and 3. may be switched. Optimizing your code by hand is a lot of work. On the other hand, gathering the profiling data can be difficult for some programs as well.

(**)做到这一点的一种方法是通过展开循环来改变循环.您还可以让优化器自动执行此操作.不过,您必须进行衡量,因为展开会影响您与缓存的交互方式,并且很可能最终导致悲观情绪.

(**) One way to do that is transform your loops by for example unrolling them. You can also let the optimizer do it automatically. You must measure though, because unrolling will affect the way you interact with the cache and may well end up being a pessimization.

这篇关于分支感知编程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆