与海湾合作委员会RDTSC优化问题 [英] Issues of gcc optimization with rdtsc

查看:190
本文介绍了与海湾合作委员会RDTSC优化问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用RDTSC和CPUID指令(用挥发性内汇编指令)来衡量一个程序的CPU周期。该RDTSC指令给出了我的Linux程序现实的结果(速度优化-o2 -fomit帧指针)和Windows(使用速度优化选项的C编译器为MS Visual Studio 2008中(我认为它VC 9.0))。

I am using rdtsc and cpuid instructions (using volatile inline assembly instructions) to measure the CPU cycles of a program. The rdtsc instruction gives realistic results for my programs on Linux (with speed optimization -o2 -fomit-frame-pointer) and Windows (using speed optimization options C compiler for MS Visual Studio 2008 (I think its VC 9.0)).

最近,我实现了一个新的方案,它使用了很多表的查找和这样的东西的。不过,这一方案在Linux上使用gcc的优化RDTSC测量结果总是在错误的测量结果(非常小的CPU周期数),比我预期。在Windows上运行时(编译优化和编译器我上面提到的)同一个程序的rdtsc的测量是现实的,并同意了预期。

Recently, I implemented a new program, which uses a lot of table-lookups and stuff like this. However, the rdtsc measurements of this program with gcc optimization on Linux always results in wrong measurements (very small number of CPU cycles) than I expect. The rdtsc measurements of the same program while running on Windows (compiled with optimizations and compiler I mentioned above) are realistic and agree to out expectations.

我的问题是有没有办法gcc的优化一些移动挥发性汇编指令,其中产生上述行为?

My question is there any way gcc optimization move the volatile assembly instructions some where to produce the above mentioned behaviour?

我对定时器code下面给出:

My code for the timers is given below:

#define TIMER_VARS                                                 \
  uint32 start_lo, start_hi;                                       \
  uint32 ticks_lo, ticks_hi

#define TIMER_START()                                              \
  __asm__ __volatile__                                             \
     ("rdtsc"                                                      \
     : "=a" (start_lo), "=d" (start_hi) /* a = eax, d = edx*/      \
     : /* no input parameters*/                                    \
     : "%ebx", "%ecx", "memory")

#define TIMER_STOP()                                               \
  __asm__ __volatile__                                             \
     ("rdtsc"                                                      \
     "\n        subl %2, %%eax"                                    \
     "\n        sbbl %3, %%edx"                                    \
     : "=&a" (ticks_lo), "=&d" (ticks_hi)                          \
     : "g" (start_lo), "g" (start_hi)                              \
     : "%ebx", "%ecx", "memory")

我会非常感激,如果一些机构可以在此提出一些想法。

I would be very thankful if some body could suggest some ideas on this.

感谢,

推荐答案

在为prevent内联 RDTSC 函数来自全国各地的加载/存储被移动/其他操作,你都应该写ASM为 __ asm__ __volatile __ ,包括记忆在撞名单。如果不这样做后,GCC是删除ASM或通过可能需要的结果(或更改输入)的ASM的任何指示移动它pvented $ P $,但它仍然可以相对于不相干的操作移动。在记忆撞意味着GCC不能对内存中的内容做任何假设(任何变量,其地址已被潜在的泄露)横跨ASM保持不变,因而变得更更难以移动。然而,GCC可能的还是的能够跨越只修改局部变量,其地址是从来没有(因为它们不是记忆)。

In order to prevent an inline rdtsc function from being moved across any loads/stores/other operations, you should both write the asm as __asm__ __volatile__ and include "memory" in the clobber list. Without doing the latter, GCC is prevented from removing the asm or moving it across any instructions that could need the results (or change the inputs) of the asm, but it could still move it with respect to unrelated operations. The "memory" clobber means that GCC cannot make any assumptions about memory contents (any variable whose address has been potentially leaked) remaining the same across the asm, and thus it becomes much more difficult to move it. However, GCC may still be able to move the asm across instructions that only modify local variables whose address was never taken (since they are not "memory").

呵呵,和wildplasser在评论中说,检查ASM输出你浪费了很多时间在这之前。

Oh, and as wildplasser said in a comment, check the asm output before you waste a lot of time on this.

这篇关于与海湾合作委员会RDTSC优化问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆