使用 RDTSC 测量时差 - 结果太大 [英] Measuring time difference using RDTSC - results too large

查看:29
本文介绍了使用 RDTSC 测量时差 - 结果太大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算运行单个 ASM 指令所需的 CPU 周期数.为了做到这一点,我创建了这个函数:

I'm trying to calculate number of CPU cycles required to run single ASM instruction. In order to do this, I've created this function:

measure_register_op:
    # Calculate time of required for movl operation

    # function setup
    pushl %ebp
    movl %esp, %ebp
    pushl %ebx
    pushl %edi

    xor %edi, %edi

    # first time measurement
    xorl %eax, %eax
    cpuid               # sync of threads
    rdtsc               # result in edx:eax

    # we are measuring instuction below
    movl %eax, %edi     

    # second time measurement
    cpuid               # sync of threads
    rdtsc               # result in edx:eax

    # time difference
    sub %eax, %edi

    # move to EAX. Value of EAX is what function returns
    movl %edi, %eax

    # End of function
    popl %edi
    popl %ebx
    mov %ebp, %esp
    popl %ebp

    ret

我在 *.c 文件中使用它:

I'm using it in *.c file:

extern unsigned int measure_register_op();

int main(void)
{

    for (int a = 0; a < 10; a++)
    {
        printf("Instruction took %u cycles \n", measure_register_op());
    }

    return 0;
}

问题是:我看到的值太大了.我现在收到 3684414156.这里会出现什么问题?

The problem is: the values I see are way too large. I'm getting 3684414156 now. What could go wrong here?

从EBX改为EDI,结果还是一样.它必须与 rdtsc 本身有关.在调试器中,我可以看到第二个测量结果为 0x7f61e078 和第一个 0x42999940,减去后仍然给出 1019758392

Changed from EBX to EDI, but result is still similar. It have to be something with rdtsc itself. In debugger I can see that second measurement results with 0x7f61e078 and first 0x42999940, which, after substraction still gives around 1019758392

这是我的makefile.也许我编译不正确:

Here is my makefile. Maybe I'm compiling it incorrectly:

compile: measurement.s measurement.c
    gcc -g measurement.s measurement.c -o ./build/measurement -m32

这是我看到的确切结果:

Here is an exact result I see:

Instruction took 4294966680 cycles 
Instruction took 4294966696 cycles 
Instruction took 4294966688 cycles 
Instruction took 4294966672 cycles 
Instruction took 4294966680 cycles 
Instruction took 4294966688 cycles 
Instruction took 4294966688 cycles 
Instruction took 4294966696 cycles 
Instruction took 4294966688 cycles 
Instruction took 4294966680 cycles 

推荐答案

在你的更新版本中没有破坏开始时间(错误 @R. 指出):

In your update version that doesn't clobber the start time (the bug @R. pointed out):

sub %eax, %edi 正在计算 start - end.这是一个负数,即低于 2^32 的一个巨大的无符号数.如果您打算使用 %u,请习惯于在调试时将其输出解释回位模式.

sub %eax, %edi is calculating start - end. This is a negative number, i.e. a huge unsigned number just below 2^32. If you're going to use %u, get used to interpreting its output back to a bit-pattern when debugging.

你想要end - start.

顺便说一句,使用lfence;它比 cpuid 更有效.它保证在英特尔上序列化指令执行(不像完整的序列化指令那样刷新存储缓冲区).它在 启用了 Spectre 缓解的 AMD CPU 上也是安全的.

And BTW, use lfence; it's significantly more efficient than cpuid. It's guaranteed to serialize instruction execution on Intel (without flushing the store buffer like a full serializing instruction). It's also safe on AMD CPUs with Spectre mitigation enabled.

另见 http://akaros.cs.berkeley.edu/lxr/akaros/kern/arch/x86/rdtsc_test.c 用于序列化 RDTSC 和/或 RDTSCP 的一些不同方法.

See also http://akaros.cs.berkeley.edu/lxr/akaros/kern/arch/x86/rdtsc_test.c for some different ways to serialize RDTSC and/or RDTSCP.

另请参阅获取 CPU 周期计数?,了解有关 RDTSC 的更多信息,尤其是它不计算核心时钟周期,只计算参考周期.所以怠速/涡轮增压会影响你的结果.

See also Get CPU cycle count? for more about RDTSC, especially that it doesn't count core clock cycles, only reference cycles. So idle/turbo will affect your results.

此外,一条指令的成本不是一维的.像这样使用 RDTSC 对单个指令进行计时并不是特别有用.请参阅NASM 中的 RDTSCP 总是返回相同的值,了解更多关于如何测量单个指令的吞吐量/延迟/uop.

Also, the cost of one instruction isn't one-dimensional. It's not particularly useful to time a single instruction with RDTSC like that. See RDTSCP in NASM always returns the same value for more about how to measure throughput/latency/uops for a single instruction.

RDTSC 可用于为整个循环或更长的指令序列计时,比 CPU 的 OoO 执行窗口大.

RDTSC can be useful for timing a whole loop or longer sequence of instructions, larger than the OoO execution window of your CPU.

这篇关于使用 RDTSC 测量时差 - 结果太大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆