ARM性能计数器与Linux Clock_GetTime [英] ARM performance counters vs linux clock_gettime

查看：361 发布时间：2020/4/23 10:37:35 linux arm performancecounter gettime time-measurement

本文介绍了ARM性能计数器与Linux Clock_GetTime的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发板上使用Zynq芯片(ZC702)，它具有667MHz的双cortex-A9 MPCore，并带有Linux内核3.3 我想比较一个程序的执行时间，所以首先 clock_gettime，然后使用协处理器提供的计数器的ARM.计数器每一个处理器周期递增一次. (基于此 stackoverflow问题和此)

I am using a Zynq chip on a development board ( ZC702 ) , which has a dual cortex-A9 MPCore at 667MHz and comes with a Linux kernel 3.3 I wanted to compare the execution time of a program so first a used clock_gettime and then used the counters provided by the co-processor of ARM. The counter increment every one processor cycle. ( based on this question of stackoverflow and this)

我用-O0标志编译程序(因为我不希望进行任何重新排序或优化)

I compile the program with -O0 flag ( since I don't want any reordering or optimization done)

我使用性能计数器测量的时间是 583833498(周期)/666.666687 MHz = 875750.221 (微秒)

The time I measure with the performance counters is 583833498 ( cycles ) / 666.666687 MHz = 875750.221 (microseconds)

使用clock_gettime()时(REALTIME或MONOTONIC或MONOTONIC_RAW) 测量的时间是: 731627.126 (微秒) 比 150000 微秒少..

While using clock_gettime() ( either REALTIME or MONOTONIC or MONOTONIC_RAW ) the time measured is : 731627.126 ( microseconds) which is 150000 microseconds less..

有人可以解释我为什么会这样吗? 为什么有区别?处理器没有时钟刻度，怎么可能得到更少的执行时间，由clock_gettime衡量?我在下面有一个示例代码:

Can anybody explain me why is this happening? Why is there a difference? The processor does not clock-scale , how is it possible to get less execution time measured by clock_gettime ? I have a sample code below:

#define RUNS 50000000
#define BENCHMARK(val) \
__asm__  __volatile__("mov r4, %1\n\t" \
                 "mov r5, #0\n\t" \
                 "1:\n\t"\
                 "add r5,r5,r4\n\t"\
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "sub r4,r4,#1\n\t" \
                 "cmp r4, #0\n\t" \
                 "bne 1b\n\t" \
                 "mov %0 ,r5  \n\t" \
                 :"=r" (val) \
                 : "r" (RUNS) \
                 : "r4","r5" \
        );
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
__asm__ __volatile__ ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(start_cycles));
for(index=0;index<5;index++)
{
    BENCHMARK(i);
}
__asm__ __volatile__ ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(end_cycles));
clock_gettime(CLOCK_MONOTONIC_RAW,&stop);

ARM性能计数器与Linux Clock_GetTime [英] ARM performance counters vs linux clock_gettime

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

ARM性能计数器与Linux Clock_GetTime [英] ARM performance counters vs linux clock_gettime

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭