clock_gettime()是否足以满足亚微秒级的计时要求? [英] Is clock_gettime() adequate for submicrosecond timing?

查看:62
本文介绍了clock_gettime()是否足以满足亚微秒级的计时要求?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我们的应用程序的Linux构建中,我需要一个用于嵌入式探查器的高分辨率计时器.我们的分析器测量的范围与单个函数一样小,因此它需要一个优于 25 纳秒的计时器精度.

I need a high-resolution timer for the embedded profiler in the Linux build of our application. Our profiler measures scopes as small as individual functions, so it needs a timer precision of better than 25 nanoseconds.

以前,我们的实现使用内联汇编和 rdtsc 操作从以下位置查询高频计时器CPU,但是这是有问题的,并且需要经常进行重新校准.

Previously our implementation used inline assembly and the rdtsc operation to query the high-frequency timer from the CPU directly, but this is problematic and requires frequent recalibration.

所以我尝试使用 clock_gettime 函数代替查询CLOCK_PROCESS_CPUTIME_ID.文档声称这给了我十亿分之一秒的计时,但是我发现单次调用 clock_gettime()的开销超过了250ns.这使得不可能为事件设置100ns的时间,并且计时器功能的开销如此之大,严重降低了应用程序的性能,使配置文件失真,超出了价值.(我们每秒有数十万个分析节点.)

So I tried using the clock_gettime function instead to query CLOCK_PROCESS_CPUTIME_ID. The docs allege this gives me nanosecond timing, but I found that the overhead of a single call to clock_gettime() was over 250ns. That makes it impossible to time events 100ns long, and having such high overhead on the timer function seriously drags down app performance, distorting the profiles beyond value. (We have hundreds of thousands of profiling nodes per second.)

是否可以调用开销小于¼μs的 clock_gettime()?还是可以通过其他方法可靠地获取时间戳计数器< 25ns的开销?还是我坚持使用 rdtsc ?

以下是我用于计时 clock_gettime()的代码.

Below is the code I used to time clock_gettime().

// calls gettimeofday() to return wall-clock time in seconds:
extern double Get_FloatTime();
enum { TESTRUNS = 1024*1024*4 };

// time the high-frequency timer against the wall clock
{
    double fa = Get_FloatTime();
    timespec spec; 
    clock_getres( CLOCK_PROCESS_CPUTIME_ID, &spec );
    printf("CLOCK_PROCESS_CPUTIME_ID resolution: %ld sec %ld nano\n", 
            spec.tv_sec, spec.tv_nsec );
    for ( int i = 0 ; i < TESTRUNS ; ++ i )
    {
        clock_gettime( CLOCK_PROCESS_CPUTIME_ID, &spec );
    }
    double fb = Get_FloatTime();
    printf( "clock_gettime %d iterations : %.6f msec %.3f microsec / call\n",
        TESTRUNS, ( fb - fa ) * 1000.0, (( fb - fa ) * 1000000.0) / TESTRUNS );
}
// and so on for CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_THREAD_CPUTIME_ID.

结果:

CLOCK_PROCESS_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 3115.784947 msec 0.371 microsec / call
CLOCK_MONOTONIC resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2505.122119 msec 0.299 microsec / call
CLOCK_REALTIME resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2456.186031 msec 0.293 microsec / call
CLOCK_THREAD_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2956.633930 msec 0.352 microsec / call

这是在标准Ubuntu内核上的.该应用程序是Windows应用程序的端口(我们的rdtsc内联程序集可以正常工作).

This is on a standard Ubuntu kernel. The app is a port of a Windows app (where our rdtsc inline assembly works just fine).

x86-64 GCC是否具有与

Does x86-64 GCC have some intrinsic equivalent to __rdtsc(), so I can at least avoid inline assembly?

推荐答案

否.您必须使用特定于平台的代码来做到这一点.在x86和x86-64上,您可以使用"rdtsc"读取时间戳计数器.

No. You'll have to use platform-specific code to do it. On x86 and x86-64, you can use 'rdtsc' to read the Time Stamp Counter.

只需移植您正在使用的rdtsc程序集.

Just port the rdtsc assembly you're using.

__inline__ uint64_t rdtsc(void) {
  uint32_t lo, hi;
  __asm__ __volatile__ (      // serialize
  "xorl %%eax,%%eax \n        cpuid"
  ::: "%rax", "%rbx", "%rcx", "%rdx");
  /* We cannot use "=A", since this would use %rax on x86_64 and return only the lower 32bits of the TSC */
  __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
  return (uint64_t)hi << 32 | lo;
}

这篇关于clock_gettime()是否足以满足亚微秒级的计时要求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆