测量时间执行单一指令 [英] measure time to execute single instruction

查看:116
本文介绍了测量时间执行单一指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有使用C语言或汇编或者甚至C#来获取需要多长时间执行ADD指令?一个准确的衡量办法

Is there a way using C or assembler or maybe even C# to get an accurate measure of how long it takes to execute a ADD instruction?

推荐答案

是的,排序的,但它是不平凡的和产生的结果是几乎的意义,至少在最合理的现代处理器。

Yes, sort of, but it's non-trivial and produces results that are almost meaningless, at least on most reasonably modern processors.

在相对较慢的处理器(例如,向上通过在英特尔行原来的奔腾,在大多数小型嵌入式处理器仍然如此),你可以看在处理器的数据表,它会(通常)告诉你有多少时钟如何滴答期待。快速,简单和容易。

On relatively slow processors (e.g., up through the original Pentium in the Intel line, still true on most small embedded processors) you can just look in the processor's data sheet and it'll (normally) tell you how many clock ticks to expect. Quick, simple, and easy.

在一个现代化的台式机(例如,Pentium Pro的或更高版本),生活不的的那么简单。这些CPU可以同时执行若干指令,并且只要有不它们之间的任何依赖关系执行它们出故障了。这意味着由一个单一指令所花费的时间的整个概念变得几乎没有意义。所花费的时间来执行一个指令可以并且将取决于围绕它的说明

On a modern desktop machine (e.g., Pentium Pro or newer), life isn't nearly that simple. These CPUs can execute a number of instructions at a time, and execute them out of order as long as there aren't any dependencies between them. This means the whole concept of the time taken by a single instruction becomes almost meaningless. The time taken to execute one instruction can and will depend on the instructions that surround it.

这是说,是的,如果你真的想,你可以(一般 - 这取决于处理器)衡量的东西,但它是开放的确切相当大的问题有多大,它会真正的意思。即使得到这样的结果,这只是的靠近的意义,而不是完全没有意义是不平凡的,但。例如,Intel或AMD的芯片上,可以使用RDTSC做定时测量本身。令人遗憾的是,可以不按顺序如上所述执行。为了得到有意义的结果,你需要不能乱序执行的指令(一个序列化指令)将其包围。对于最常见的选择是 CPUID ,因为它是为数不多的序列化指令可用为用户模式(即环3)计划之一。这增添了几分扭曲本身,但:英特尔作为记录,前几次处理器执行CPUID,它可以采取比后面倍。因此,他们建议您执行它的的时候,你用它来序列化你的时间了。因此,一般的顺序运行是这样的:

That said, yes, if you really want to, you can (usually -- depending on the processor) measure something, though it's open to considerable question exactly how much it'll really mean. Even getting a result like this that's only close to meaningless instead of completely meaningless isn't trivial though. For example, on an Intel or AMD chip, you can use RDTSC to do the timing measurement itself. That, unfortunately, can be executed out of order as described above. To get meaningful results, you need to surround it by an instruction that can't be executed out of order (a "serializing instruction"). The most common choice for that is CPUID, since it's one of the few serializing instructions that's available to "user mode" (i.e., ring 3) programs. That adds a bit of a twist itself though: as documented by Intel, the first few times the processor executes CPUID, it can take longer than subsequent times. As such, they recommend that you execute it three times before you use it to serialize your timing. Therefore, the general sequence runs something like this:

.align 16
CPUID
CPUID
CPUID
RDTSC
; sequence under test
Add eax, ebx
; end of sequence under test
CPUID
RDTSC

然后比较,为做同样的结果,但与序列被测除去。这是离开了一个相当富裕的细节,当然 - 您至少需要:

Then you compare that to a result from doing the same, but with the sequence under test removed. That's leaving out quite a fe details, of course -- at minimum you need to:

  1. 每个CPUID之前正确设置寄存器了

  2. 保存EAX的值:EDX第一RDTSC后

  3. 减去第一个
  4. 第二RDTSC结果
  1. set the registers up correctly before each CPUID
  2. save the value in EAX:EDX after the first RDTSC
  3. subtract result from the second RDTSC from the first

另外请注意,我插入了对齐指令 - 指令对齐可能会影响时序为好,特别是如果一个循环是参与

Also note the "align" directive I've inserted -- instruction alignment can and will affect timing as well, especially if a loop is involved.

这篇关于测量时间执行单一指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆