C ++提示在ARM设备code优化 [英] C++ Tips for code optimization on ARM devices

查看:313
本文介绍了C ++提示在ARM设备code优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在开发C ++ code对ARM设备和code优化增强现实是非常重要的,以
保持良好的帧速率。为了升到效率达到最高水平,我认为收集通用技巧是非常重要的
让生活更容易为编译器和减少程序的圈子中的数量。任何建议表示欢迎。

I have been developing C++ code for augmented reality on ARM devices and optimization of the code is very important in order to keep a good frame rate. In order to rise efficiency to the maximum level I think it is important to gather general tips that make life easier for compilers and reduce the number of cicles of the program. Any suggestion is welcomed.

1 避免高成本的说明:除法,平方根,正弦,余弦

1- Avoid high-cost instructions: division, square root, sin, cos


  • 使用逻辑移位来划分或乘以2。

  • 通过逆乘以可能的情况下。

2 - 优化的内部for循环:他们是一个botleneck所以我们应该避免使许多计算在里面,尤其是分裂,平方根..

2- Optimize inner "for" loops: they are a botleneck so we should avoid making many calculations inside, especially divisions, square roots..

3使用查找表对于一些数学函数(SIN,COS,...)

3- Use look-up tables for some mathematical functions (sin, cos, ...)

有用的工具


  • objdump的:获取编译的程序装配code。这使得比较两种功能,并检查它是否真的进行了优化。

推荐答案

要优化C ++ code为ARM时,回答有关的一般规则的问题,这里有一些建议:

To answer your question about general rules when optimizing C++ code for ARM, here are a few suggestions:

1)正如你所说,没有除法指令。使用逻辑移位或可能时由逆相乘。结果
2)内存比CPU执行慢得多;使用逻辑运算,以避免小的查找表。结果
3)尝试在同一时间写的32位,使写入缓冲区的最佳利用。写短裤或字符将大大减缓code下来。换句话说,它的速度更快逻辑或更小的位一起,它们写成DWORDS。结果
4)请注意您的L1 / L2高速缓存的大小。作为一般规则,ARM芯片比英特尔更小的缓存。结果
5)使用SIMD(NEON)在可能的情况。 NEON指令是非常强大和向量化code,可以相当快。 NEON内在函数在大多数C ++环境中使用,并且可以几乎一样快,书写手调整ASM code。结果
6)使用高速缓存prefetch提示(PLD)加快循环读取。 ARM不具备智能preCACHE逻辑,现代的英特尔芯片做的方式。结果
7)不要相信编译器生成好的code。看ASM输出和ASM重写热点。对于位/字节操作,C语言能不能有效,因为他们可以在ASM完成指定的东西。 ARM具有强大的3操作数指令,多负载/存储和自由的变化,可以超越什么编译器能够产生的。结果

1) As you mentioned, there is no divide instruction. Use logical shifts or multiply by the inverse when possible.
2) Memory is much slower than CPU execution; use logical operations to avoid small lookup tables.
3) Try to write 32-bits at a time to make best use of the write buffer. Writing shorts or chars will slow the code down considerably. In other words, it's faster to logical-OR the smaller bits together and write them as DWORDS.
4) Be aware of your L1/L2 cache size. As a general rule, ARM chips have much smaller caches than Intel.
5) Use SIMD (NEON) when possible. NEON instructions are quite powerful and for "vectorizable" code, can be quite fast. NEON intrinsics are available in most C++ environments and can be nearly as fast as writing hand tuned ASM code.
6) Use the cache prefetch hint (PLD) to speed up looping reads. ARM doesn't have smart precache logic the way that modern Intel chips do.
7) Don't trust the compiler to generate good code. Look at the ASM output and rewrite hotspots in ASM. For bit/byte manipulation, the C language can't specify things as efficiently as they can be accomplished in ASM. ARM has powerful 3-operand instructions, multi-load/store and "free" shifts that can outperform what the compiler is capable of generating.

这篇关于C ++提示在ARM设备code优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆