优化 ARM Cortex M3 代码 [英] Optimizing ARM Cortex M3 code

查看：33 发布时间：2021/11/17 21:54:46 assembly optimization arm stm32 disassembly

本文介绍了优化 ARM Cortex M3 代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 C 函数，它试图将帧缓冲区复制到 FSMC RAM.

I have a C Function which tries to copy a framebuffer to FSMC RAM.

这些函数将游戏循环的帧速率消耗到 10FPS.我想知道如何分析反汇编函数，我应该计算每个指令周期吗?我想知道 CPU 的时间在哪里，在哪个部分.我确定算法也是一个问题，因为它的O(N^2)

The functions eats the frame rate of the game loop to 10FPS. I would like to know how to analyze the disassembled function, should I count each instruction cycle ? I want to know where the CPU spend its time, in which part. I'm sure that the algorithm is also a problem, because its O(N^2)

C 函数是:

void LCD_Flip()
{

    u8  i,j;


    LCD_SetCursor(0x00, 0x0000);
    LCD_WriteRegister(0x0050,0x00);//GRAM horizontal start position
    LCD_WriteRegister(0x0051,239);//GRAM horizontal end position
    LCD_WriteRegister(0x0052,0);//Vertical GRAM Start position
    LCD_WriteRegister(0x0053,319);//Vertical GRAM end position
    LCD_WriteIndex(0x0022);

    for(j=0;j<fbHeight;j++)
    {
        for(i=0;i<240;i++)
        {
            u16 color = frameBuffer[i+j*fbWidth];
            LCD_WriteData(color);

        }
    }

}

反汇编函数:

08000fd0 <LCD_Flip>:
 8000fd0:   b580        push    {r7, lr}
 8000fd2:   b082        sub sp, #8
 8000fd4:   af00        add r7, sp, #0
 8000fd6:   2000        movs    r0, #0
 8000fd8:   2100        movs    r1, #0
 8000fda:   f7ff fde9   bl  8000bb0 <LCD_SetCursor>
 8000fde:   2050        movs    r0, #80 ; 0x50
 8000fe0:   2100        movs    r1, #0
 8000fe2:   f7ff feb5   bl  8000d50 <LCD_WriteRegister>
 8000fe6:   2051        movs    r0, #81 ; 0x51
 8000fe8:   21ef        movs    r1, #239    ; 0xef
 8000fea:   f7ff feb1   bl  8000d50 <LCD_WriteRegister>
 8000fee:   2052        movs    r0, #82 ; 0x52
 8000ff0:   2100        movs    r1, #0
 8000ff2:   f7ff fead   bl  8000d50 <LCD_WriteRegister>
 8000ff6:   2053        movs    r0, #83 ; 0x53
 8000ff8:   f240 113f   movw    r1, #319    ; 0x13f
 8000ffc:   f7ff fea8   bl  8000d50 <LCD_WriteRegister>
 8001000:   2022        movs    r0, #34 ; 0x22
 8001002:   f7ff fe87   bl  8000d14 <LCD_WriteIndex>
 8001006:   2300        movs    r3, #0
 8001008:   71bb        strb    r3, [r7, #6]
 800100a:   e01b        b.n 8001044 <LCD_Flip+0x74>
 800100c:   2300        movs    r3, #0
 800100e:   71fb        strb    r3, [r7, #7]
 8001010:   e012        b.n 8001038 <LCD_Flip+0x68>
 8001012:   79f9        ldrb    r1, [r7, #7]
 8001014:   79ba        ldrb    r2, [r7, #6]
 8001016:   4613        mov r3, r2
 8001018:   011b        lsls    r3, r3, #4
 800101a:   1a9b        subs    r3, r3, r2
 800101c:   011b        lsls    r3, r3, #4
 800101e:   1a9b        subs    r3, r3, r2
 8001020:   18ca        adds    r2, r1, r3
 8001022:   4b0b        ldr r3, [pc, #44]   ; (8001050 <LCD_Flip+0x80>)
 8001024:   f833 3012   ldrh.w  r3, [r3, r2, lsl #1]
 8001028:   80bb        strh    r3, [r7, #4]
 800102a:   88bb        ldrh    r3, [r7, #4]
 800102c:   4618        mov r0, r3
 800102e:   f7ff fe7f   bl  8000d30 <LCD_WriteData>
 8001032:   79fb        ldrb    r3, [r7, #7]
 8001034:   3301        adds    r3, #1
 8001036:   71fb        strb    r3, [r7, #7]
 8001038:   79fb        ldrb    r3, [r7, #7]
 800103a:   2bef        cmp r3, #239    ; 0xef
 800103c:   d9e9        bls.n   8001012 <LCD_Flip+0x42>
 800103e:   79bb        ldrb    r3, [r7, #6]
 8001040:   3301        adds    r3, #1
 8001042:   71bb        strb    r3, [r7, #6]
 8001044:   79bb        ldrb    r3, [r7, #6]
 8001046:   2b63        cmp r3, #99 ; 0x63
 8001048:   d9e0        bls.n   800100c <LCD_Flip+0x3c>
 800104a:   3708        adds    r7, #8
 800104c:   46bd        mov sp, r7
 800104e:   bd80        pop {r7, pc}

推荐答案

不能完全回答你的问题，但我看到你渴望快速循环的执行.

Not exactly answering your question, but I see you aspire for fast execution of the loops.

以下是书中的一些提示:'ARM 系统开发人员指南:设计和优化系统软件(计算机体系结构中的摩根考夫曼系列和设计)'http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745

Here are some tips from the book: 'ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design)' http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745

第 5 章包含名为C 循环结构"的部分.以下是该部分的摘要:

Chapter 5 contains section named 'C looping structures'. Here is the summary of the section:

高效编写循环

使用倒计时到零的循环.那么编译器就不需要分配一个寄存器来保存终止值，与零的比较是自由的.
默认使用无符号循环计数器和继续条件 i!=0 而不是 i>0.这将确保循环开销只有两条指令.
当您知道循环将至少迭代一次时，请使用 do-while 循环而不是 for 循环.这节省了编译器检查以查看循环计数是否为零.
展开重要循环以减少循环开销.不要过度展开.如果循环开销占总数的比例很小，那么展开将增加代码大小并损害缓存的性能.
尽量安排数组中元素的个数是四或八的倍数.然后，您可以轻松地展开循环 2、4 或 8 次，而无需担心剩余的数组元素.

根据总结，您的内部循环可能如下所示.

Based on the summary, your inner loop might look as below.

uinsigned int i = 240/4;  // Use unsigned loop counters by default
                          // and the continuation condition i!=0

do
{
    // Unroll important loops to reduce the loop overhead
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
    LCD_WriteData( (u16)frameBuffer[ (i--) + (j*fbWidth) ] );
}
while ( i != 0 )  // Use do-while loops rather than for
                  // loops when you know the loop will
                  // iterate at least once

您可能还想尝试使用编译指示"，例如:

You might want to experiment also with 'pragmas', e.g. :

#pragma Otime

http://www.keil.com/support/man/docs/armcc/armcc_chr1359124989673.htm

#pragma unroll(n)

http://www.keil.com/support/man/docs/armcc/armcc_chr1359124992247.htm

因为是 Cortex-M3，所以尝试找出 MCU 硬件是否让您有机会安排代码/数据以利用其 Harvard 架构(我体验了 30% 的速度提升).

And as it is Cortex-M3 try to find out if MCU hardware gives you chance to arrange the code/data to take advantage of its Harvard architecture (I experienced 30% speed increase).

在这里查看我的其他答案

也许并非所有内容都适用于您的应用程序(以相反的顺序填充缓冲区).我只是想画您对本书的关注以及可能的优化点.

Maybe not everything may be applicable in your application (filling a buffer in reverse order). I just wanted to draw your attention to the book and possible points for optimization.

这篇关于优化 ARM Cortex M3 代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

优化 ARM Cortex M3 代码 [英] Optimizing ARM Cortex M3 code

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

优化 ARM Cortex M3 代码 [英] Optimizing ARM Cortex M3 code

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭