Haswell,Sandy Bridge,Ivy Bridge和Skylake的BTB大小是多少? [英] BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

查看:242
本文介绍了Haswell,Sandy Bridge,Ivy Bridge和Skylake的BTB大小是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何方法可以确定Haswell,Sandy Bridge,Ivy Bridge和Skylake Intel处理器的分支目标缓冲区大小?

Are there any way to determine or any resource where I can find the branch Target Buffer size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake Intel processors?

推荐答案

由Agner Fog检查软件优化资源, http://www.agner.org/optimize/

Check Software optimization resources by Agner Fog, http://www.agner.org/optimize/

BTB应该位于 Intel,AMD和VIA CPU的微体系结构:汇编程序员和编译器制造商的优化指南中, http://www.agner.org/optimize/microarchitecture.pdf

BTB should be in "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers", http://www.agner.org/optimize/microarchitecture.pdf


3.7 Intel Sandy Bridge和Ivy Bridge中的分支预测

3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge

BTB组织。根据非官方传言,Sandy Bridge中的分支目标缓冲区比Nehalem中的
大。它是像Core 2和更早的处理器中那样具有一个级别,还是像Nehalem中那样具有两个级别,这是未知的。每16个
代码字节最多可以处理四个调用指令。如果每16个字节的代码中有3条以上的分支指令,则条件跳转的效率较低。

BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.

3.8英特尔Haswell,Broadwell和Skylake中的分支预测

3.8 Branch prediction in Intel Haswell, Broadwell and Skylake

BTB组织。分支目标缓冲区的组织是未知的。它似乎相当大。

BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.

Intel可能会在《 Intel 64和IA-32体系结构优化参考手册》中描述一些数据。 href = http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html> http://www.intel。 com / content / www / us / en / architecture-and-technology / 64-ia-32-architectures-optimization-manual.html 围绕 3.4.1分支预测优化,但仍然没有大小。

Intel may describe some data in "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html around "3.4.1 Branch Prediction Optimization" but still no sizes.

它可能看起来很奇怪,但是在1998-2000年间没有关于cpuid中BTB的信息: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (由德国蒂宾根大学的Gerald J. Heim撰写)。而且仍未在 http://www.felixcloutier.com/x86/CPUID.html 或某些公共场合中列出英特尔工作人员的资料...

It may looks strange, but there were no information about BTB in cpuid in 1998-2000: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel workers...


 * This table describes the possible cache and TLB configurations
 * as documented by Intel. For now AMD doesn't use this but gives
 * exact cache layout data on CPUID 0x8000000x.
 *
 * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information
 * to 80 bytes (of which 16 bytes are used in generic Pentii2).
 * With 80 possible caches we are on the safe side for one or two years.
 *
 * Strange enough no BHT, BTB or return stack data is given this way...


应该有一些针对BTB的性能监视单元(PMU)计数器,并且有一些实验可以通过运行特殊的测试程序来获取BTB的大小,请检查http://xania.org/201602/haswell-and-ivy-btb 作者Matt Mattbolt

There should be some Performance monitoring unit (PMU) counters for BTB, and there are experiments to get BTB size from running special test programs, check http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt


结论

Conclusions

从这些结果看来,Ivy Bridge(因此可能是Sandy Bridge)使用了几乎相同的策略对于无条件分支的BTB查找,尽管具有更大的表大小:将4096个条目分成4组的1024组。

From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.

对于Haswell来说,确定组的新方法似乎具有

For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.

以及他有关分支预测及其事件的更多帖子:

and more his posts about branch prediction and its events:

  • http://xania.org/201602/bpu-part-one Static branch prediction on newer Intel processors
  • http://xania.org/201602/bpu-part-two Branch prediction - part two
  • http://xania.org/201602/bpu-part-three The BTB in contemporary Intel chips)
  • http://xania.org/201602/bpu-part-four Branch Target Buffer, part 2

根据Agner的测试,他的代码是公开的: https://github.com/mattgodbolt/agner https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py https://github.com/mattgodbolt/agner/blob/master/tests/branch.py​​

His code is public, based on Agner's tests: https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py

这篇关于Haswell,Sandy Bridge,Ivy Bridge和Skylake的BTB大小是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆