是一个指针间接比条件更昂贵? [英] Is a pointer indirection more costly than a conditional?

查看:213
本文介绍了是一个指针间接比条件更昂贵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我观察到最得体的编译器可以precompute一个指针间接在不同程度上 - 可能消除大部分分支指令 - 但我感兴趣的是一个间接的成本是否大于成本在产生code分支点。

I've observed that most decent compilers can precompute a pointer indirection to varying degrees--possibly removing most branching instructions--but what I'm interested in is whether the cost of an indirection is greater than the cost of a branch point in the generated code.

我预计,如果由指针引用的数据是不是在运行时的缓存中可能发生的高速缓存刷新,但我没有任何数据来支持这一点。

I would expect that if the data referenced by the pointer is not in a cache at runtime that a cache flush might occur, but I don't have any data to back that.

有没有人对此事可靠的数据(或合理的意见)?

Does anyone have solid data (or a justifiable opinion) on the matter?


编辑:若干海报指出,没有通常情况下关于分支的成本:它广泛变化从芯片到芯片

Several posters noted that there is no "general case" on the cost of branching: it varies wildly from chip to chip.

如果你碰巧知道一个显着的情况下分支会更便宜(有或无分支prediction)除了在缓存间接的,请别提了。

If you happen to know of a notable case where branching would be cheaper (with or without branch prediction) than an in-cache indirection, please mention it.

推荐答案

这是对环境非常依赖。

1如何往往是在缓存中的数据(L1,L2,L3)或和频率,必须从RAM中取出所有的方式?

1 How often is the data in cache (L1, L2, L3) or and how often it must be fetched all the way from the RAM?

从RAM中读取将花费大约10-40ns。当然,这将填补整个高速缓存行比这更小,因此,如果您再使用接下来的几个字节为好,它肯定不会伤坏。

A fetch from RAM will take around 10-40ns. Of course, that will fill a whole cache-line in little more than that, so if you then use the next few bytes as well, it will definitely not "hurt as bad".

2处理器是什么呢?

旧英特尔Pentium4是著名的为他们长输管线的阶段,并会采取25-30 clockcycles(〜在2GHz为15ns),以恢复从一个分支,这是错误predicted。

Older Intel Pentium4 were famous for their long pipeline stages, and would take 25-30 clockcycles (~15ns at 2GHz) to "recover" from a branch that was mispredicted.

3如何predictable是条件?

3 How "predictable" is the condition?

分公司prediction确实有助于在现代的处理器,他们可以用非predictable分支应付得很好了,但是它确实有一点点心痛。

Branch prediction really helps in modern processors, and they can cope quite well with "unpredictable" branches too, but it does hurt a little bit.

4如何忙和脏是缓存?

4 How "busy" and "dirty" is the cache?

如果你不得不放弃一些脏数据来填充缓存线,它会采取另一种15-50ns上的获取数据时的顶部。

If you have to throw out some dirty data to fill the cache-line, it will take another 15-50ns on top of the "fetch the data in" time.

的间接本身将是一个快速指令,但当然,如果下一个指令后,立即使用该数据,则可能不能够立即执行该指令 - 即使数据是在L1高速缓存。

The indirection itself will be a fast instruction, but of course, if the next instruction uses the data immediately after, you may not be able to execute that instruction immediately - even if the data is in L1 cache.

在一个好日子(以及predicted,目标在缓存中,风在正确的方向,等等),一个分支,而另一方面,需要3-7个周期。

On a good day (well predicted, target in cache, wind in the right direction, etc), a branch, on the other hand, takes 3-7 cycles.

最后,当然,编译器通常了然什么最...;)

And finally, of course, the compiler USUALLY knows quite well what works best... ;)

总之,很难肯定地说,并告诉是什么你的情况更好。将基准替代解决方案的唯一途径。我会瘦,间接内存访问比跳快,但没有看到什么code源编译成,这是相当难说。

In summary, it's hard to say for sure, and the only way to tell what is better IN YOUR case would be to benchmark alternative solutions. I would thin that an indirect memory access is faster than a jump, but without seeing what code your source compiles to, it's quite hard to say.

这篇关于是一个指针间接比条件更昂贵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆