如何测量程序执行时间的ARM Cortex-A8处理器? [英] How to measure program execution time in ARM Cortex-A8 processor?

查看:806
本文介绍了如何测量程序执行时间的ARM Cortex-A8处理器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用称为一个的i.MX515的ARM Cortex-A8处理器。有Linux操作系统Ubuntu 9.10发行。我正在用C写的一个非常大的应用程序,我利用函数gettimeofday的(); 函数来衡量我的应用程序所需要的时间。

 的main(){函数gettimeofday(开始);
....
....
....
函数gettimeofday(完)}

这个方法就足以看一下我的应用程序块正在采取的什么时间量。但是,现在,我想我的优化code非常throughly,随着时间计算的函数gettimeofday()方法中,我看到了很多连续运行之间波动(运行之前和之后我的优化),所以我中号无法确定实际的执行时间,因此,我的改进的影响

任何人都可以给我建议,我应该怎么办?

如果通过访问循环计数器(理念在ARM网站建议的Cortex-M3的),任何人都可以点我的一些code这给我,我必须遵循访问定时器的步骤在Cortex-A8的寄存器

如果这个方法不是很准确,那么请提出一些替代方案。

感谢


按照UPS

后续1:写在code巫术下面的程序,生成的可执行文件,当我试图在黑板上运行,我得到了 - 非法指令消息:(

 静态内联unsigned int类型get_cyclecount(无效)
{
    无符号整型值;
    //读取CCNT注册
    ASM挥发性(MRC P15,0,0%,C9,C13,0 \\ t \\ n:= R(值));
    返回值;
}静态内嵌无效init_perfcounters(int32_t do_reset,int32_t enable_divider)
{
    //一般启用所有的计数器(包括循环计数器)
    int32_t值= 1;    // [执行复位:
    如果(do_reset)
    {
    值| = 2; //所有计数器复位至零。
    值| = 4; //循环计数器复位到零。
    }    如果(enable_divider)
    值| = 8; //启用64分的CCNT。    值| = 16;    //程序中的性能计数器控制寄存器:
    ASM挥发性(MCR P15,0,0%,C9,C12,0 \\ t \\ n::R(值));    //启用所有计数器:
    ASM挥发性(MCR P15,0,0%,C9,C12,1 \\ t \\ n::R(0x8000000f));    //明确溢出:
    ASM挥发性(MCR P15,0,0%,C9,C12,3 \\ t \\ n::R(0x8000000f));
}诠释的main()
{    / *启用对性能计数器*用户模式访问/
ASM(MCR P15,0,0%,C9,C14,0 \\ n \\ t::R(1));/ *禁止计数器溢出中断(以防万一)* /
ASM(MCR P15,0,0%,C9,C14,2 \\ n \\ t::R(0x8000000f));    init_perfcounters(1,0);    //测量计数的开销:
    unsigned int类型的开销= get_cyclecount();
    开销= get_cyclecount() - 开销;    unsigned int类型T = get_cyclecount();    //这里做一些东西..
    的printf(\\ nHello世界!);    T = get_cyclecount() - 吨;    的printf(功能恰好了%d个周期(包括函数调用),T - 开销);    get_cyclecount();    返回0;
}

后续2:我写了飞思卡尔的支持,他们都送我回了如下答复和程序的(我不明白,从它更)

下面是我们可以帮助您现在:
我送你附加code,发送使用UART,从你的code的流的例子,看来你是不正确初始化的MPU。

 (散)包括<&stdio.h中GT;
(散)包括<&stdlib.h中GT;(散)定义BIT13 0x02000(散)定义R32挥发性无符号长*
(散)定义R16挥发性无符号短*
(散)定义R8挥发性无符号字符*(散)定义reg32_UART1_USR1(*(R32)(0x73FBC094))
(散)定义reg32_UART1_UTXD(*(R32)(0x73FBC040))(散)定义reg16_WMCR(*(R16)(0x73F98008))
(散)定义reg16_WSR(*(R16)(0x73F98002))(散)定义AIPS_TZ1_BASE_ADDR 0x70000000
(散)定义IOMUXC_BASE_ADDR AIPS_TZ1_BASE_ADDR + 0x03FA8000typedef的无符号长U32;
typedef的无符号短U16;
的typedef unsigned char型U8;
无效serv_WDOG()
{
    reg16_WSR = 0x5555加;
    reg16_WSR =加上0xAAAA;
}
无效outbyte(CHAR CH)
{
    而((reg32_UART1_USR1&安培;!BIT13));    reg32_UART1_UTXD = CH;
}
无效_init()
{}无效暂停(INT时间)
{
    INT I;    对于(i = 0; I<时间,我++);}
导致无效()
{//写入数据寄存器(DR)    *(R32)(0x73F88000)= 0x00000040; // 1 - > GPIO 2_6
    暂停(500000);    *(R32)(0x73F88000)= 00000000; // 0 - > GPIO 2_6
    暂停(500000);
}无效init_port_for_led()
{
// GPIO 2_6 [73F8_8000] EIM_D22(AC11)DIAG_LED_GPIO
// ALT1模式
// IOMUXC_SW_MUX_CTL_PAD_EIM_D22 [+ 0x0074]
// MUX_MODE [2:0] = 001:选择MUX模式:ALT1 MUX端口:GPIO实例[6]:GPIO2。 // IOMUXC为GPIO2_6控制*(R32)(IOMUXC_BASE_ADDR + 0x74)= 00000001;//写入DIR寄存器[DIR]*(R32)(0x73F88004)= 0x00000040; // 1:GPIO 2_6 - 输出*(R32)(0x83FDA090)= 0x00003001;
*(R32)(0x83FDA090)= 0x00000007;
}诠释的main()
{
  INT K = 0x12345678的;    reg16_WMCR = 0; //禁止看门狗
    init_port_for_led();    而(1)
    {
        的printf(你好字%X \\ n \\ r,即K);
        serv_WDOG();
        LED() ;    }    返回(1);
}


解决方案

访问性能计数器并不难,但你必须从内核模式启用它们。默认情况下,计数器被禁用。

在简单地说,你必须执行在内核中的以下两行。无论是作为一个可加载模块,或在电路板的init只是增加了两行的地方就可以了:

  / *启用对性能计数器*用户模式访问/
  ASM(MCR P15,0,0%,C9,C14,0 \\ n \\ t::R(1));  / *禁止计数器溢出中断(以防万一)* /
  ASM(MCR P15,0,0%,C9,C14,2 \\ n \\ t::R(0x8000000f));

一旦你这样做循环计数器将开始递增每个周期。寄存器溢出时会被忽视,并且不会造成任何问题(除非他们可能弄乱你的测量)。

现在你想从用户模式访问周期计数器:

我们先从读取寄存器功能:

 静态内联unsigned int类型get_cyclecount(无效)
{
  无符号整型值;
  //读取CCNT注册
  ASM挥发性(MRC P15,0,0%,C9,C13,0 \\ t \\ n:= R(值));
  返回值;
}

和你最有可能想重置,并设置分频器和:

 静态内嵌无效init_perfcounters(int32_t do_reset,int32_t enable_divider)
{
  //一般启用所有的计数器(包括循环计数器)
  int32_t值= 1;  // [执行复位:
  如果(do_reset)
  {
    值| = 2; //所有计数器复位至零。
    值| = 4; //循环计数器复位到零。
  }  如果(enable_divider)
    值| = 8; //启用64分的CCNT。  值| = 16;  //程序中的性能计数器控制寄存器:
  ASM挥发性(MCR P15,0,0%,C9,C12,0 \\ t \\ n::R(值));  //启用所有计数器:
  ASM挥发性(MCR P15,0,0%,C9,C12,1 \\ t \\ n::R(0x8000000f));  //明确溢出:
  ASM挥发性(MCR P15,0,0%,C9,C12,3 \\ t \\ n::R(0x8000000f));
}

do_reset 将循环计数器设置为零。简单的作为。

enable_diver 将使1/64周期分频器。如果没有这个标志设置你会测量每个周期。有了它启用了计数器获取每64个周期增加。如果你想测量时间长,否则将导致计数器溢出,这非常有用。

如何使用它:

  //初始化计数器:
  init_perfcounters(1,0);  //测量计数的开销:
  unsigned int类型的开销= get_cyclecount();
  开销= get_cyclecount() - 开销;  unsigned int类型T = get_cyclecount();  //这里做一些东西..
  call_my_function();  T = get_cyclecount() - 吨;  的printf(功能恰好了%d个周期(包括函数调用),T - 开销);

应该在所有的Cortex-A8处理器工作..

噢 - 一些注意事项:

使用这些计数器,你会衡量在两个电话之间的确切时间get_cyclecount()的一切,包括在其他进程或内核中度过的。有没有办法来限制测量你的过程或单个线程。

还呼吁 get_cyclecount()是不是免费的。它将编译到一个asm指令,但是从协处理器移动将停止整个ARM的管道。开销是相当高的,可以扭曲你的测量。幸运的是,开销也是固定的,所以你可以测量并从计时减去它。

在我的例子中,我这样做,每次测量。不要在实践中做到这一点。中断迟早会在两个电话之间发生,并进一步扭曲您的测量。我建议你​​衡量开销几次空闲系统上,忽略所有外人使用固定的常量。

I'm using an ARM Cortex-A8 based processor called as i.MX515. There is linux Ubuntu 9.10 distribution. I'm running a very big application written in C and I'm making use of gettimeofday(); functions to measure the time my application takes.

main()

{

gettimeofday(start);
....
....
....
gettimeofday(end);

}

This method was sufficient to look at what blocks of my application was taking what amount of time. But, now that, I'm trying to optimize my code very throughly, with the gettimeofday() method of calculating time, I see a lot of fluctuation between successive runs (Run before and after my optimizations), so I'm not able to determine the actual execution times, hence the impact of my improvements.

Can anyone suggest me what I should do?

If by accessing the cycle counter (Idea suggested on ARM website for Cortex-M3) can anyone point me to some code which gives me the steps I have to follow to access the timer registers on Cortex-A8?

If this method is not very accurate then please suggest some alternatives.

Thanks


Follow ups

Follow up 1: Wrote the following program on Code Sorcery, the executable was generated which when I tried running on the board, I got - Illegal instruction message :(

static inline unsigned int get_cyclecount (void)
{
    unsigned int value;
    // Read CCNT Register
    asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));
    return value;
}

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
    // in general enable all counters (including cycle counter)
    int32_t value = 1;

    // peform reset:
    if (do_reset)
    {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
    }

    if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

    value |= 16;

    // program the performance-counter control-register:
    asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));

    // enable all counters:
    asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));

    // clear overflows:
    asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}



int main()
{

    /* enable user-mode access to the performance counter*/
asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

/* disable counter overflow interrupts (just in case)*/
asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

    init_perfcounters (1, 0);

    // measure the counting overhead:
    unsigned int overhead = get_cyclecount();
    overhead = get_cyclecount() - overhead;

    unsigned int t = get_cyclecount();

    // do some stuff here..
    printf("\nHello World!!");

    t = get_cyclecount() - t;

    printf ("function took exactly %d cycles (including function call) ", t - overhead);

    get_cyclecount();

    return 0;
}

Follow up 2: I had written to Freescale for support and they have sent me back the following reply and a program (I did not quite understand much from it)

Here is what we can help you with right now: I am sending you attach an example of code, that sends an stream using the UART, from what your code, it seems that you are not init correctly the MPU.

(hash)include <stdio.h>
(hash)include <stdlib.h>

(hash)define BIT13 0x02000

(hash)define R32   volatile unsigned long *
(hash)define R16   volatile unsigned short *
(hash)define R8   volatile unsigned char *

(hash)define reg32_UART1_USR1     (*(R32)(0x73FBC094))
(hash)define reg32_UART1_UTXD     (*(R32)(0x73FBC040))

(hash)define reg16_WMCR         (*(R16)(0x73F98008))
(hash)define reg16_WSR              (*(R16)(0x73F98002))

(hash)define AIPS_TZ1_BASE_ADDR             0x70000000
(hash)define IOMUXC_BASE_ADDR               AIPS_TZ1_BASE_ADDR+0x03FA8000

typedef unsigned long  U32;
typedef unsigned short U16;
typedef unsigned char  U8;


void serv_WDOG()
{
    reg16_WSR = 0x5555;
    reg16_WSR = 0xAAAA;
}


void outbyte(char ch)
{
    while( !(reg32_UART1_USR1 & BIT13)  );

    reg32_UART1_UTXD = ch ;
}


void _init()
{

}



void pause(int time) 
{
    int i;

    for ( i=0 ; i < time ;  i++);

} 


void led()
{

//Write to Data register [DR]

    *(R32)(0x73F88000) = 0x00000040;  // 1 --> GPIO 2_6 
    pause(500000);

    *(R32)(0x73F88000) = 0x00000000;  // 0 --> GPIO 2_6 
    pause(500000);


}

void init_port_for_led()
{


//GPIO 2_6   [73F8_8000] EIM_D22  (AC11)    DIAG_LED_GPIO
//ALT1 mode
//IOMUXC_SW_MUX_CTL_PAD_EIM_D22  [+0x0074]
//MUX_MODE [2:0]  = 001: Select mux mode: ALT1 mux port: GPIO[6] of instance: gpio2.

 // IOMUXC control for GPIO2_6

*(R32)(IOMUXC_BASE_ADDR + 0x74) = 0x00000001; 

//Write to DIR register [DIR]

*(R32)(0x73F88004) = 0x00000040;  // 1 : GPIO 2_6  - output

*(R32)(0x83FDA090) = 0x00003001;
*(R32)(0x83FDA090) = 0x00000007;


}

int main ()
{
  int k = 0x12345678 ;

    reg16_WMCR = 0 ;                        // disable watchdog
    init_port_for_led() ;

    while(1)
    {
        printf("Hello word %x\n\r", k ) ;
        serv_WDOG() ;
        led() ;

    }

    return(1) ;
}

解决方案

Accessing the performance counters isn't difficult, but you have to enable them from kernel-mode. By default the counters are disabled.

In a nutshell you have to execute the following two lines inside the kernel. Either as a loadable module or just adding the two lines somewhere in the board-init will do:

  /* enable user-mode access to the performance counter*/
  asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

  /* disable counter overflow interrupts (just in case)*/
  asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

Once you did this the cycle counter will start incrementing for each cycle. Overflows of the register will go unnoticed and don't cause any problems (except they might mess up your measurements).

Now you want to access the cycle-counter from the user-mode:

We start with a function that reads the register:

static inline unsigned int get_cyclecount (void)
{
  unsigned int value;
  // Read CCNT Register
  asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));  
  return value;
}

And you most likely want to reset and set the divider as well:

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
  // in general enable all counters (including cycle counter)
  int32_t value = 1;

  // peform reset:  
  if (do_reset)
  {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
  } 

  if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

  value |= 16;

  // program the performance-counter control-register:
  asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  

  // enable all counters:  
  asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  

  // clear overflows:
  asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}

do_reset will set the cycle-counter to zero. Easy as that.

enable_diver will enable the 1/64 cycle divider. Without this flag set you'll be measuring each cycle. With it enabled the counter gets increased for every 64 cycles. This is useful if you want to measure long times that would otherwise cause the counter to overflow.

How to use it:

  // init counters:
  init_perfcounters (1, 0); 

  // measure the counting overhead:
  unsigned int overhead = get_cyclecount();
  overhead = get_cyclecount() - overhead;    

  unsigned int t = get_cyclecount();

  // do some stuff here..
  call_my_function();

  t = get_cyclecount() - t;

  printf ("function took exactly %d cycles (including function call) ", t - overhead);

Should work on all Cortex-A8 CPUs..

Oh - and some notes:

Using these counters you'll measure the exact time between the two calls to get_cyclecount() including everything spent in other processes or in the kernel. There is no way to restrict the measurement to your process or a single thread.

Also calling get_cyclecount() isn't free. It will compile to a single asm-instruction, but moves from the co-processor will stall the entire ARM pipeline. The overhead is quite high and can skew your measurement. Fortunately the overhead is also fixed, so you can measure it and subtract it from your timings.

In my example I did that for every measurement. Don't do this in practice. An interrupt will sooner or later occur between the two calls and skew your measurements even further. I suggest that you measure the overhead a couple of times on an idle system, ignore all outsiders and use a fixed constant instead.

这篇关于如何测量程序执行时间的ARM Cortex-A8处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆