clock_gettime()是否足以满足亚微秒级的计时要求? [英] Is clock_gettime() adequate for submicrosecond timing?

查看：62 发布时间：2021/5/29 22:20:19 linux performance ubuntu profiling

本文介绍了clock_gettime()是否足以满足亚微秒级的计时要求?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们的应用程序的Linux构建中，我需要一个用于嵌入式探查器的高分辨率计时器.我们的分析器测量的范围与单个函数一样小，因此它需要一个优于 25 纳秒的计时器精度.

I need a high-resolution timer for the embedded profiler in the Linux build of our application. Our profiler measures scopes as small as individual functions, so it needs a timer precision of better than 25 nanoseconds.

以前，我们的实现使用内联汇编和 rdtsc 操作从以下位置查询高频计时器CPU，但是这是有问题的，并且需要经常进行重新校准.

Previously our implementation used inline assembly and the rdtsc operation to query the high-frequency timer from the CPU directly, but this is problematic and requires frequent recalibration.

所以我尝试使用 clock_gettime 函数代替查询CLOCK_PROCESS_CPUTIME_ID.文档声称这给了我十亿分之一秒的计时，但是我发现单次调用 clock_gettime()的开销超过了250ns.这使得不可能为事件设置100ns的时间，并且计时器功能的开销如此之大，严重降低了应用程序的性能，使配置文件失真，超出了价值.(我们每秒有数十万个分析节点.)

So I tried using the clock_gettime function instead to query CLOCK_PROCESS_CPUTIME_ID. The docs allege this gives me nanosecond timing, but I found that the overhead of a single call to clock_gettime() was over 250ns. That makes it impossible to time events 100ns long, and having such high overhead on the timer function seriously drags down app performance, distorting the profiles beyond value. (We have hundreds of thousands of profiling nodes per second.)

是否可以调用开销小于¼μs的 clock_gettime()?还是可以通过其他方法可靠地获取时间戳计数器< 25ns的开销?还是我坚持使用 rdtsc ?

以下是我用于计时 clock_gettime()的代码.

Below is the code I used to time clock_gettime().

// calls gettimeofday() to return wall-clock time in seconds:
extern double Get_FloatTime();
enum { TESTRUNS = 1024*1024*4 };

// time the high-frequency timer against the wall clock
{
    double fa = Get_FloatTime();
    timespec spec; 
    clock_getres( CLOCK_PROCESS_CPUTIME_ID, &spec );
    printf("CLOCK_PROCESS_CPUTIME_ID resolution: %ld sec %ld nano\n", 
            spec.tv_sec, spec.tv_nsec );
    for ( int i = 0 ; i < TESTRUNS ; ++ i )
    {
        clock_gettime( CLOCK_PROCESS_CPUTIME_ID, &spec );
    }
    double fb = Get_FloatTime();
    printf( "clock_gettime %d iterations : %.6f msec %.3f microsec / call\n",
        TESTRUNS, ( fb - fa ) * 1000.0, (( fb - fa ) * 1000000.0) / TESTRUNS );
}
// and so on for CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_THREAD_CPUTIME_ID.

结果:

CLOCK_PROCESS_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 3115.784947 msec 0.371 microsec / call
CLOCK_MONOTONIC resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2505.122119 msec 0.299 microsec / call
CLOCK_REALTIME resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2456.186031 msec 0.293 microsec / call
CLOCK_THREAD_CPUTIME_ID resolution: 0 sec 1 nano
clock_gettime 8388608 iterations : 2956.633930 msec 0.352 microsec / call

这是在标准Ubuntu内核上的.该应用程序是Windows应用程序的端口(我们的rdtsc内联程序集可以正常工作).

This is on a standard Ubuntu kernel. The app is a port of a Windows app (where our rdtsc inline assembly works just fine).

x86-64 GCC是否具有与

Does x86-64 GCC have some intrinsic equivalent to __rdtsc(), so I can at least avoid inline assembly?

clock_gettime()是否足以满足亚微秒级的计时要求? [英] Is clock_gettime() adequate for submicrosecond timing?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

clock_gettime()是否足以满足亚微秒级的计时要求? [英] Is clock_gettime() adequate for submicrosecond timing?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭