memset在32位嵌入式平台上运行缓慢 [英] memset slow on 32-bit embedded platform
问题描述
我正在开发嵌入式设备(STM32,ARM-Cortex M4),并且期望 memset
和类似功能可以进行速度优化。但是,我注意到行为比预期慢得多。我在 -O3
arm-none-eabi-gcc 等) >优化标志。
I am developing on an embedded device (STM32, ARM-Cortex M4) and expected memset
and similar functions to be optimized for speed. However, I noticed much slower behavior than expected. I'm using GNU ARM embedded compiler/linker (arm-none-eabi-gcc
, etc) with the -O3
optimization flag.
我调查了反汇编, memset
函数一次写入一个字节并重新检查
I looked into the disassembly and the memset
function is writing one byte at a time and rechecking bounds at each iteration.
0x802e2c4 <memset>: add r2, r0
0x802e2c6 <memset+2>: mov r3, r0
0x802e2c8 <memset+4>: cmp r3, r2
0x802e2ca <memset+6>: bne.n 0x802e2ce <memset+10>
0x802e2cc <memset+8>: bx lr
0x802e2ce <memset+10>: strb.w r1, [r3], #1
0x802e2d2 <memset+14>: b.n 0x802e2c8
自然,此代码可以通过使用32位写入和/或循环展开可加快代码的编写速度。实现者有可能选择不对速度进行优化,以降低代码大小。
Naturally, this code could be sped up by using 32-bit writes and/or loop unrolling at the expense of code size. It is possible the implementers chose not to optimize this for speed in order to keep code size down.
内存集
标头和库包含在以下位置:
The memset
header and library are being included from:
C:\Program Files (x86)\GNU Tools Arm Embedded\7 2018-q2-update\arm-none-eabi\include\string.h
C:\Program Files (x86)\GNU Tools Arm Embedded\7 2018-q2-update\arm-none-eabi\include\c++\7.3.1\cmath
这个问题是与现有问题类似,但不同之处在于它以嵌入式平台为目标。
This question is similar to existing questions but is different in that it targets an embedded platform.
在GNU ARM嵌入式软件包中是否有容易获得的优化内存集?如果可以,我该如何访问它?
推荐答案
链接无 -specs = nano.specs
。这将使用C库的版本,其中包括 memset
,该版本针对速度而不是大小进行了优化。这将引入许多其他功能的较大版本(通常可疑: printf
和 malloc
),可以再次对其进行优化通过其他链接器选项。检查反汇编和链接器映射文件会有所帮助。
Link without -specs=nano.specs
. This will use the version of the C library, which includes memset
, that is optimized for speed instead of size. This will pull in larger versions of many other functions (usual suspects: printf
and malloc
), which could again be optimized by additional linker options. Examining the disassembly and linker map file will help.
这篇关于memset在32位嵌入式平台上运行缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!