memset在32位嵌入式平台上运行缓慢 [英] memset slow on 32-bit embedded platform

查看:464
本文介绍了memset在32位嵌入式平台上运行缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发嵌入式设备(STM32,ARM-Cortex M4),并且期望 memset 和类似功能可以进行速度优化。但是,我注意到行为比预期慢得多。我在 -O3 arm-none-eabi-gcc 等) >优化标志。

I am developing on an embedded device (STM32, ARM-Cortex M4) and expected memset and similar functions to be optimized for speed. However, I noticed much slower behavior than expected. I'm using GNU ARM embedded compiler/linker (arm-none-eabi-gcc, etc) with the -O3 optimization flag.

我调查了反汇编, memset 函数一次写入一个字节并重新检查

I looked into the disassembly and the memset function is writing one byte at a time and rechecking bounds at each iteration.

0x802e2c4 <memset>: add r2, r0
0x802e2c6 <memset+2>:   mov r3, r0
0x802e2c8 <memset+4>:   cmp r3, r2
0x802e2ca <memset+6>:   bne.n   0x802e2ce <memset+10>
0x802e2cc <memset+8>:   bx  lr
0x802e2ce <memset+10>:  strb.w  r1, [r3], #1
0x802e2d2 <memset+14>:  b.n 0x802e2c8

自然,此代码可以通过使用32位写入和/或循环展开可加快代码的编写速度。实现者有可能选择不对速度进行优化,以降低代码大小。

Naturally, this code could be sped up by using 32-bit writes and/or loop unrolling at the expense of code size. It is possible the implementers chose not to optimize this for speed in order to keep code size down.

内存集标头和库包含在以下位置:

The memset header and library are being included from:

C:\Program Files (x86)\GNU Tools Arm Embedded\7 2018-q2-update\arm-none-eabi\include\string.h
C:\Program Files (x86)\GNU Tools Arm Embedded\7 2018-q2-update\arm-none-eabi\include\c++\7.3.1\cmath

这个问题是与现有问题类似,但不同之处在于它以嵌入式平台为目标。

This question is similar to existing questions but is different in that it targets an embedded platform.

在GNU ARM嵌入式软件包中是否有容易获得的优化内存集?如果可以,我该如何访问它?

推荐答案

链接 -specs = nano.specs 。这将使用C库的版本,其中包括 memset ,该版本针对速度而不是大小进行了优化。这将引入许多其他功能的较大版本(通常可疑: printf malloc ),可以再次对其进行优化通过其他链接器选项。检查反汇编和链接器映射文件会有所帮助。

Link without -specs=nano.specs. This will use the version of the C library, which includes memset, that is optimized for speed instead of size. This will pull in larger versions of many other functions (usual suspects: printf and malloc), which could again be optimized by additional linker options. Examining the disassembly and linker map file will help.

这篇关于memset在32位嵌入式平台上运行缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆