如何测量 ARM Cortex-A8 处理器中的程序执行时间? [英] How to measure program execution time in ARM Cortex-A8 processor?

查看:70
本文介绍了如何测量 ARM Cortex-A8 处理器中的程序执行时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是名为 i.MX515 的基于 ARM Cortex-A8 的处理器.有 linux Ubuntu 9.10 发行版.我正在运行一个用 C 编写的非常大的应用程序,我正在使用 gettimeofday(); 函数来测量我的应用程序花费的时间.

I'm using an ARM Cortex-A8 based processor called as i.MX515. There is linux Ubuntu 9.10 distribution. I'm running a very big application written in C and I'm making use of gettimeofday(); functions to measure the time my application takes.

main()

{

gettimeofday(start);
....
....
....
gettimeofday(end);

}

这种方法足以查看我的应用程序的哪些块占用了多少时间.但是,现在,我正在尝试非常彻底地优化我的代码,使用 gettimeofday() 计算时间的方法,我看到连续运行之间有很多波动(在优化之前和之后运行),所以我无法确定实际执行时间,从而确定我的改进的影响.

This method was sufficient to look at what blocks of my application was taking what amount of time. But, now that, I'm trying to optimize my code very throughly, with the gettimeofday() method of calculating time, I see a lot of fluctuation between successive runs (Run before and after my optimizations), so I'm not able to determine the actual execution times, hence the impact of my improvements.

谁能建议我应该怎么做?

Can anyone suggest me what I should do?

如果通过访问循环计数器(ARM 网站上为 Cortex-M3 推荐的想法),任何人都可以指向我一些代码,这些代码为我提供了访问计时器所必须遵循的步骤 在 Cortex-A8 上注册?

If by accessing the cycle counter (Idea suggested on ARM website for Cortex-M3) can anyone point me to some code which gives me the steps I have to follow to access the timer registers on Cortex-A8?

如果这种方法不是很准确,请提出一些替代方法.

If this method is not very accurate then please suggest some alternatives.

谢谢

Follow up 1: 在 Code Sorcery 上写了以下程序,生成了可执行文件,当我尝试在板上运行时,我得到 - 非法指令消息:(

static inline unsigned int get_cyclecount (void)
{
    unsigned int value;
    // Read CCNT Register
    asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));
    return value;
}

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
    // in general enable all counters (including cycle counter)
    int32_t value = 1;

    // peform reset:
    if (do_reset)
    {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
    }

    if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

    value |= 16;

    // program the performance-counter control-register:
    asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));

    // enable all counters:
    asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));

    // clear overflows:
    asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}



int main()
{

    /* enable user-mode access to the performance counter*/
asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

/* disable counter overflow interrupts (just in case)*/
asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

    init_perfcounters (1, 0);

    // measure the counting overhead:
    unsigned int overhead = get_cyclecount();
    overhead = get_cyclecount() - overhead;

    unsigned int t = get_cyclecount();

    // do some stuff here..
    printf("\nHello World!!");

    t = get_cyclecount() - t;

    printf ("function took exactly %d cycles (including function call) ", t - overhead);

    get_cyclecount();

    return 0;
}

跟进 2:我已经写信给飞思卡尔寻求支持,他们给我发回了以下回复和一个程序(我对此不太了解)

以下是我们现在可以为您提供的帮助:我正在向您发送一个代码示例,该示例使用 UART 发送流,从您的代码来看,您似乎没有正确初始化 MPU.

(hash)include <stdio.h>
(hash)include <stdlib.h>

(hash)define BIT13 0x02000

(hash)define R32   volatile unsigned long *
(hash)define R16   volatile unsigned short *
(hash)define R8   volatile unsigned char *

(hash)define reg32_UART1_USR1     (*(R32)(0x73FBC094))
(hash)define reg32_UART1_UTXD     (*(R32)(0x73FBC040))

(hash)define reg16_WMCR         (*(R16)(0x73F98008))
(hash)define reg16_WSR              (*(R16)(0x73F98002))

(hash)define AIPS_TZ1_BASE_ADDR             0x70000000
(hash)define IOMUXC_BASE_ADDR               AIPS_TZ1_BASE_ADDR+0x03FA8000

typedef unsigned long  U32;
typedef unsigned short U16;
typedef unsigned char  U8;


void serv_WDOG()
{
    reg16_WSR = 0x5555;
    reg16_WSR = 0xAAAA;
}


void outbyte(char ch)
{
    while( !(reg32_UART1_USR1 & BIT13)  );

    reg32_UART1_UTXD = ch ;
}


void _init()
{

}



void pause(int time) 
{
    int i;

    for ( i=0 ; i < time ;  i++);

} 


void led()
{

//Write to Data register [DR]

    *(R32)(0x73F88000) = 0x00000040;  // 1 --> GPIO 2_6 
    pause(500000);

    *(R32)(0x73F88000) = 0x00000000;  // 0 --> GPIO 2_6 
    pause(500000);


}

void init_port_for_led()
{


//GPIO 2_6   [73F8_8000] EIM_D22  (AC11)    DIAG_LED_GPIO
//ALT1 mode
//IOMUXC_SW_MUX_CTL_PAD_EIM_D22  [+0x0074]
//MUX_MODE [2:0]  = 001: Select mux mode: ALT1 mux port: GPIO[6] of instance: gpio2.

 // IOMUXC control for GPIO2_6

*(R32)(IOMUXC_BASE_ADDR + 0x74) = 0x00000001; 

//Write to DIR register [DIR]

*(R32)(0x73F88004) = 0x00000040;  // 1 : GPIO 2_6  - output

*(R32)(0x83FDA090) = 0x00003001;
*(R32)(0x83FDA090) = 0x00000007;


}

int main ()
{
  int k = 0x12345678 ;

    reg16_WMCR = 0 ;                        // disable watchdog
    init_port_for_led() ;

    while(1)
    {
        printf("Hello word %x\n\r", k ) ;
        serv_WDOG() ;
        led() ;

    }

    return(1) ;
}

推荐答案

访问性能计数器并不困难,但您必须从内核模式启用它们.默认情况下,计数器被禁用.

Accessing the performance counters isn't difficult, but you have to enable them from kernel-mode. By default the counters are disabled.

简而言之,您必须在内核中执行以下两行.无论是作为可加载模块还是在 board-init 的某处添加两行都可以:

In a nutshell you have to execute the following two lines inside the kernel. Either as a loadable module or just adding the two lines somewhere in the board-init will do:

  /* enable user-mode access to the performance counter*/
  asm ("MCR p15, 0, %0, C9, C14, 0\n\t" :: "r"(1));

  /* disable counter overflow interrupts (just in case)*/
  asm ("MCR p15, 0, %0, C9, C14, 2\n\t" :: "r"(0x8000000f));

执行此操作后,循环计数器将开始为每个循环递增.寄存器的溢出不会被注意到,也不会造成任何问题(除非它们可能会扰乱您的测量).

Once you did this the cycle counter will start incrementing for each cycle. Overflows of the register will go unnoticed and don't cause any problems (except they might mess up your measurements).

现在你想从用户模式访问循环计数器:

Now you want to access the cycle-counter from the user-mode:

我们从读取寄存器的函数开始:

We start with a function that reads the register:

static inline unsigned int get_cyclecount (void)
{
  unsigned int value;
  // Read CCNT Register
  asm volatile ("MRC p15, 0, %0, c9, c13, 0\t\n": "=r"(value));  
  return value;
}

而且您很可能还想重置和设置分隔线:

And you most likely want to reset and set the divider as well:

static inline void init_perfcounters (int32_t do_reset, int32_t enable_divider)
{
  // in general enable all counters (including cycle counter)
  int32_t value = 1;

  // peform reset:  
  if (do_reset)
  {
    value |= 2;     // reset all counters to zero.
    value |= 4;     // reset cycle counter to zero.
  } 

  if (enable_divider)
    value |= 8;     // enable "by 64" divider for CCNT.

  value |= 16;

  // program the performance-counter control-register:
  asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value));  

  // enable all counters:  
  asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));  

  // clear overflows:
  asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
}

do_reset 将循环计数器设置为零.就这么简单.

do_reset will set the cycle-counter to zero. Easy as that.

enable_diver 将启用 1/64 周期分频器.如果没有设置此标志,您将测量每个周期.启用它后,计数器每 64 个周期增加一次.如果您想测量很长时间会导致计数器溢出,这很有用.

enable_diver will enable the 1/64 cycle divider. Without this flag set you'll be measuring each cycle. With it enabled the counter gets increased for every 64 cycles. This is useful if you want to measure long times that would otherwise cause the counter to overflow.

使用方法:

  // init counters:
  init_perfcounters (1, 0); 

  // measure the counting overhead:
  unsigned int overhead = get_cyclecount();
  overhead = get_cyclecount() - overhead;    

  unsigned int t = get_cyclecount();

  // do some stuff here..
  call_my_function();

  t = get_cyclecount() - t;

  printf ("function took exactly %d cycles (including function call) ", t - overhead);

应该适用于所有 Cortex-A8 CPU..

Should work on all Cortex-A8 CPUs..

哦 - 还有一些注意事项:

Oh - and some notes:

使用这些计数器,您将测量两次调用 get_cyclecount() 之间的准确时间,包括在其他进程或内核中花费的所有内容.无法将测量限制为您的进程或单个线程.

Using these counters you'll measure the exact time between the two calls to get_cyclecount() including everything spent in other processes or in the kernel. There is no way to restrict the measurement to your process or a single thread.

调用 get_cyclecount() 也不是免费的.它将编译为单个 asm 指令,但从协处理器移出会使整个 ARM 流水线停止.开销非常高,可能会影响您的测量.幸运的是,开销也是固定的,因此您可以测量它并从您的计时中减去它.

Also calling get_cyclecount() isn't free. It will compile to a single asm-instruction, but moves from the co-processor will stall the entire ARM pipeline. The overhead is quite high and can skew your measurement. Fortunately the overhead is also fixed, so you can measure it and subtract it from your timings.

在我的示例中,我对每次测量都这样做.在实践中不要这样做.两次调用之间迟早会发生中断,从而进一步扭曲您的测量结果.我建议您在空闲系统上测量几次开销,忽略所有外部人员并使用固定常量代替.

In my example I did that for every measurement. Don't do this in practice. An interrupt will sooner or later occur between the two calls and skew your measurements even further. I suggest that you measure the overhead a couple of times on an idle system, ignore all outsiders and use a fixed constant instead.

这篇关于如何测量 ARM Cortex-A8 处理器中的程序执行时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆