使用英特尔最新分支记录的开销是多少? [英] What is the overhead of using Intel Last Branch Record?

查看:130
本文介绍了使用英特尔最新分支记录的开销是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最后分支记录是指寄存器对(MSR)的集合,这些寄存器对存储与最近执行的分支有关的源地址和目标地址。 http://css.csail.mit.edu/6.858/2012/readings/ ia32 / ia32-3b.pdf 文档提供了更多信息,以备您感兴趣。

Last Branch Record refers to a collection of register pairs (MSRs) that store the source and destination addresses related to recently executed branches. http://css.csail.mit.edu/6.858/2012/readings/ia32/ia32-3b.pdf document has more information in case you are interested.


  • a)有人可以提出一个想法吗? LBR多少会减慢普通程序的程序执行速度(CPU和IO密集型?)?

  • b)当LBR跟踪打开时,分支预测是否会关闭?

推荐答案

论文英特尔代码执行跟踪资源(由Arium员工Craig Pedersen和Jeff Acampora撰写,2012年4月29日)列出了分支跟踪的三种变体:

The paper Intel Code Execution Trace Resources (by Arium workers, Craig Pedersen and Jeff Acampora, Apr 29, 2012 ) lists three variants of branch tracing:


    DebugCtlMSR中的
  • Last Branch Record(LBR)标志以及相应的LastBranchToIP和LastBranchFromIP MSR以及LastExceptionToIP和LastExceptionFromIP MSR。

  • Last Branch Record (LBR) flag in the DebugCtlMSR and corresponding LastBranchToIP and LastBranchFromIP MSRs as well as LastExceptionToIP and LastExceptionFromIP MSRs.

分支跟踪存储(BTS),使用任一cache-a s-RAM或系统DRAM。

Branch Trace Store (BTS) using either cache-as-RAM or system DRAM.

架构事件跟踪(AET)从XDP端口捕获并在外部存储

Architecture Event Trace (AET) captured off the XDP port and stored externally in a connected In-Target Probe.

如第2页中所述, LBR 保存MSR中的信息不会妨碍任何实时性能,但仅对非常短的代码有用(有效的跟踪显示非常浅,通常可能只显示数百条指令。)。仅保存有关4-16个分支的信息。

As said in page 2, LBR save information in MSRs, "does not impede any real-time performance," but is useful only for very short code ("effective trace display is very shallow and typically may only show hundreds of instructions."). Only saves info about 4-16 branches.

BTS 允许捕获多对分支从和到,以及将它们存储在高速缓存(Cache-as-RAM,CAR)或系统DRAM中。对于CAR,跟踪深度/长度受缓存大小(和某些常数)限制; DRAM迹线长度几乎是无限的。该白皮书估计,由于额外的内存存储,BTS的开销从20%到100%不等。通过建议的 perf分支记录,该Linux上的BTS易于使用(尚未推出)或 btrax项目 perf分支演示文稿提供了有关BTS组织的一些提示:有BTS缓冲区,其中包含从,到字段和预测的标志。因此,使用BTS时不会关闭分支预测。同样,当BTS缓冲区填满到最大大小时,也会产生中断。内核中的BTS处理模块(perf_events子系统或btrax内核模块)应该在发生此类中断的情况下将数据从BTS缓冲区复制到其他位置。

BTS allows to capture many pairs of branch "From"s and "To"s, and stores them in cache (Cache-as-RAM, CAR) or in system DRAM. In case of CAR, trace depth/length is limited by cache sizes (and some constant); with DRAM trace length is almost unlimited. The paper estimates overhead of BTS as from 20 up to 100 percents due to additional memory stores. BTS on Linux is easy to use with proposed perf branch record (not yet in vanilla) or btrax project. perf branch presentation gives some hints about BTS organisation: there is BTS buffer, which contains "from", "to" fields, and "predicted flag". So, branch prediction is not turned off when using BTS. Also, when BTS buffer is filled up to max size, interrupt is generated. BTS-handling module in kernel (perf_events subsystem or btrax kernel module) should copy data from BTS buffer to other location in case of such interrupt.

因此,在BTS模式下有两种开销来源:缓存/内存存储和BTS缓冲区溢出引起的中断。

So, in BTS mode there are two sources of overhead: Cache/Memory stores and interrupts from BTS buffer overflow.

AET 使用外部代理保存调试和跟踪数据。该代理通过扩展调试端口(XDP)连接,并与目标内探针(ITP)接口。根据本文,AET的开销可能会对系统性能产生重大影响,可能会高出几个数量级,因为AET可以生成/捕获更多类型的事件。但是收集的数据存储在调试平台的外部。

AET uses external agent to save debug and trace data. This agent is connected via eXtended Debug Port (XDP) and interfaces with In-Target Probe (ITP). Overhead of AET "can have a significant effect on system performance, which can be several orders of magnitude greater" according to this paper, because AET can generate/capture more types of events. But the collected data storage is external to debugged platform.

纸张的摘要中说:

Paper's "Summary" says: 

LBR 没有开销,但是很浅(4–16个分支位置,取决于CPU上的
)。跟踪数据可立即用于重置。

LBR has no overhead, but is very shallow (4–16 branch locations, depending on the CPU). Trace data is available immediately out of reset.

BTS 更深,但会影响CPU性能,因此需要
板RAM。初始化CAR后,即可立即使用跟踪数据。

BTS is much deeper, but has an impact on CPU performance and requires on-board RAM. Trace data is available as soon as CAR is initialized.

AET 需要特殊的ITP硬件,并且并非在所有CPU
架构上都可用。它具有将跟踪数据存储在外部的优点。

AET requires special ITP hardware and is not available on all CPU architectures. It has the advantage of storing the trace data off board.

这篇关于使用英特尔最新分支记录的开销是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆