ARM性能计数器与Linux Clock_GetTime [英] ARM performance counters vs linux clock_gettime

查看:361
本文介绍了ARM性能计数器与Linux Clock_GetTime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发板上使用Zynq芯片(ZC702), 它具有667MHz的双cortex-A9 MPCore,并带有Linux内核3.3 我想比较一个程序的执行时间,所以首先 clock_gettime,然后使用协处理器提供的计数器 的ARM.计数器每一个处理器周期递增一次. (基于此 stackoverflow问题)

I am using a Zynq chip on a development board ( ZC702 ) , which has a dual cortex-A9 MPCore at 667MHz and comes with a Linux kernel 3.3 I wanted to compare the execution time of a program so first a used clock_gettime and then used the counters provided by the co-processor of ARM. The counter increment every one processor cycle. ( based on this question of stackoverflow and this)

我用-O0标志编译程序(因为我不希望进行任何重新排序或优化)

I compile the program with -O0 flag ( since I don't want any reordering or optimization done)

我使用性能计数器测量的时间是 583833498(周期)/666.666687 MHz = 875750.221 (微秒)

The time I measure with the performance counters is 583833498 ( cycles ) / 666.666687 MHz = 875750.221 (microseconds)

使用clock_gettime()时(REALTIME或MONOTONIC或MONOTONIC_RAW) 测量的时间是: 731627.126 (微秒) 比 150000 微秒少..

While using clock_gettime() ( either REALTIME or MONOTONIC or MONOTONIC_RAW ) the time measured is : 731627.126 ( microseconds) which is 150000 microseconds less..

有人可以解释我为什么会这样吗? 为什么有区别?处理器没有时钟刻度,怎么可能 得到更少的执行时间,由clock_gettime衡量?我在下面有一个示例代码:

Can anybody explain me why is this happening? Why is there a difference? The processor does not clock-scale , how is it possible to get less execution time measured by clock_gettime ? I have a sample code below:

#define RUNS 50000000
#define BENCHMARK(val) \
__asm__  __volatile__("mov r4, %1\n\t" \
                 "mov r5, #0\n\t" \
                 "1:\n\t"\
                 "add r5,r5,r4\n\t"\
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "mov r4 ,r4  \n\t" \
                 "sub r4,r4,#1\n\t" \
                 "cmp r4, #0\n\t" \
                 "bne 1b\n\t" \
                 "mov %0 ,r5  \n\t" \
                 :"=r" (val) \
                 : "r" (RUNS) \
                 : "r4","r5" \
        );
clock_gettime(CLOCK_MONOTONIC_RAW,&start);
__asm__ __volatile__ ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(start_cycles));
for(index=0;index<5;index++)
{
    BENCHMARK(i);
}
__asm__ __volatile__ ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(end_cycles));
clock_gettime(CLOCK_MONOTONIC_RAW,&stop);

推荐答案

我找到了解决方案.我将该平台从Linux内核3.3.0升级到3.5,该值类似于性能计数器的值.显然,假定3.3.0中时钟计数器的频率高于其频率(大约为400 MHz),而不是CPU频率的一半.在旧版本中可能是移植错误.

I found the solution. I upgraded the platform from a linux kernel 3.3.0 to 3.5 and the value is similar to that of the performance counters. Apparently the frequency of the clock counter in 3.3.0 is assumed higher that what it is ( around 400 MHz ) instead of half of the CPU frequency. Probably a porting error in the old version.

这篇关于ARM性能计数器与Linux Clock_GetTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆