生产优化NDK $ C $下多种架构? [英] Producing optimised NDK code for multiple architectures?

查看:139
本文介绍了生产优化NDK $ C $下多种架构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些C code为Android,做大量的低层次的数字运算。我想知道我应该使用什么样的设置(如我的Andr​​oid.mk和Application.mk)文件,以便生产可以运行在目前所有的Andr​​oid设备,但code也需要优化工作的具体芯片组的优势。我在寻找好的默认Android.mk和Application.mk设置使用,我想避免乱抛垃圾我的C code用的#ifdef分支机构。

I have some C code for Android that does lots of low-level number crunching. I'd like to know what settings I should use (e.g. for my Android.mk and Application.mk) files so that the code produced will run on all current Android devices but also takes advantage of optimisations for specific chipsets. I'm looking for good default Android.mk and Application.mk settings to use and I want to avoid having to litter my C code with #ifdef branches.

例如,我知道的ARMv7已浮点指令和一些ARMv7的芯片支持NEON指令,并默认ARM两种都不支持。是否有可能设置的标志,这样我可以不NEON和默认的ARM编译与NEON,ARMv7的建设的ARMv7?我知道该怎么做了后两者,但并非所有3。我很谨慎,我用什么设置,我认为目前的默认值是最安全的设置,和什么样的风险等选项都有。

For example, I'm aware that ARMv7 has floating point instructions and some ARMv7 chips support NEON instructions and that the default ARM supports neither of these. Is it possible to set flags so that I can build ARMv7 with NEON, ARMv7 without NEON and the default ARM build? I'm know how to do the latter two but not all 3. I'm cautious about what settings I use as I assume the current defaults are the safest settings and what risks other options have.

有关GCC特定优化,我使用了以下标志:

For GCC specific optimisation, I'm using the following flags:

LOCAL_CFLAGS=-ffast-math -O3 -funroll-loops

我已经检查了所有3这些加速我的code。是否有任何其他常见的,我可以补充的吗?

I've checked all 3 of these speed up my code. Are there any other common ones I could add?

另一个技巧我是添加LOCAL_ARM_MODE:=手臂。到Android.mk到较新的ARM芯片使加速(虽然我很困惑的正是这一点做什么,什么发生在年龄较大的芯片)

Another tip I have is to add "LOCAL_ARM_MODE := arm" to Android.mk to enable a speed up on newer arm chips (although I'm confused at exactly what this does and what happens on older chips).

推荐答案

ARM处理器有2个通用指令集,它们支持:ARM和拇指。虽然有两种不同的口味,ARM指令都是32位,Thumb指令为16位。两者之间的主要区别是,ARM指令必须做更多的比拇指罐单个指令的可能性。例如单个ARM指令可以添加一个寄存器到另一个寄存器,同时对所述第二寄存器左移。在Thumb一个指令必须做的移位,然后将第二个指令是做加法。

ARM processors have 2 general instruction sets that they support: "ARM" and "Thumb". Though there are different flavors of both, ARM instructions are 32 bits each and Thumb instructions are 16 bits. The main difference between the two is that ARM instructions have the possibility to do more in a single instruction than Thumb can. For example a single ARM instruction can add one register to another register, while performing a left shift on the second register. In Thumb one instruction would have to do the shift, then a second instruction would do the addition.

ARM指令不好两倍,但在某些情况下,它们可以更快。这是特别真实的手卷ARM汇编,它可以以全新的方式,使转变为免费的物尽其用进行调整。 Thumb指令有自己的优势,以及大小:他们耗尽电池少

ARM instructions are not twice as good, but in certain cases they can be faster. This is especially true in hand-rolled ARM assembly, which can be tuned in novel ways to make the best use of "shifts for free". Thumb instructions have their own advantage as well as size: they drain the battery less.

无论如何,这是什么LOCAL_ARM_MODE做 - 这意味着你编译code为ARM指令,而不是Thumb指令。编译为Thumb是默认的NDK,因为它往往会造成一个较小的二进制和速度差距并不明显对大多数code。编译器不能总是采取额外的魅力是ARM能够提供的优势,所以你最终需要更多或更少相同的指令数呢。

Anyway, this is what LOCAL_ARM_MODE does - it means you compile your code as ARM instructions instead of Thumb instructions. Compiling to Thumb is the default in the NDK as it tends to create a smaller binary and the speed difference is not that noticeable for most code. The compiler can't always take advantage of the extra "oomph" that ARM can provide, so you end up needing more or less the same number of instructions anyway.

你从编译为ARM或Thumb C / C ++ code看到的结果将是相同的(除非编译器错误

The result of what you see from C/C++ code compiled to ARM or Thumb will be identical (barring compiler bugs).

这本身是新的,旧的ARM处理器,可今天所有的Andr​​oid手机之间的兼容。这是因为默认情况下,NDK编译到支持的ARMv5TE指令集的应用程序二进制接口基于ARM的处理器。此ABI被称为armeabi,可以明确地在Application.mk通过将设置 APP_ABI:= armeabi

This by itself is compatible between new and old ARM processors for all Android phones available today. This is because by default the NDK compiles to an "Application Binary Interface" for ARM-based CPUs that support the ARMv5TE instruction set. This ABI is known as "armeabi" and can be explicitly set in the Application.mk by putting APP_ABI := armeabi.

较新的处理器还支持Android特有的ABI称 armeabi-V7A ,延伸armeabi添加<一href="http://www.arm.com/products/processors/technologies/instruction-set-architectures.php?tab=Thumb-2+">Thumb-2指令集和一个名为的VFPv3-D16的硬件浮点指令集。 armeabi-V7A兼容的CPU也可以有选择地支持NEON指令集,你必须检查在运行时间,并提供code路径可用时,当它不是。有没有在做这个(你好氖)的NDK / samples目录的例子。引擎盖下,Thumb-2的更类似于ARM的,因为它的指令可以做更多的在一个单一的指令,而具有仍占用空间少的优势。

Newer processors also support the Android-specific ABI known as armeabi-v7a, which extends armeabi to add the Thumb-2 instruction set and a hardware floating point instruction set called VFPv3-D16. armeabi-v7a compatible CPUs can also optionally support the NEON instruction set, which you have to check for at run time and provide code paths for when it is available and when it is not. There's an example in the NDK/samples directory that does this (hello-neon). Under the hood, Thumb-2 is more "ARM-like" in that its instructions can do more in a single instruction, while having the advantage of still taking up less space.

为了编译胖二进制同时包含armeabi和armeabi-V7A库,你将以下内容添加到Application.mk:

In order to compile a "fat binary" that contains both armeabi and armeabi-v7a libraries you would add the following to Application.mk:

APP_ABI := armeabi armeabi-v7a

当安装的.apk文件,Android包管理器安装该设备的最好的图书馆。因此,在旧平台上,将安装armeabi库,并在较新的设备的armeabi-V7A之一。

When the .apk file is installed, the Android package manager installs the best library for the device. So on older platforms it would install the armeabi library, and on newer devices the armeabi-v7a one.

如果你想在运行时来测试CPU的功能,那么你可以使用NDK功能 uint64_t中android_getCpuFeatures()获得由处理器支持的功能。这将返回一个位标记 ANDROID_CPU_ARM_FEATURE_ARMv7 对V7A处理器, ANDROID_CPU_ARM_FEATURE_VFPv3 在硬件浮点支持和 ANDROID_CPU_ARM_FEATURE_NEON 如果高级SIMD指令的支持。 ARM不能有NEON没有VFPv3的。

If you want to test for CPU features at run time then you can use the NDK function uint64_t android_getCpuFeatures() to get the features supported by the processor. This returns a bit-flag of ANDROID_CPU_ARM_FEATURE_ARMv7 on v7a processors, ANDROID_CPU_ARM_FEATURE_VFPv3 if hardware floating points are supported and ANDROID_CPU_ARM_FEATURE_NEON if advanced SIMD instructions are supported. ARM can't have NEON without VFPv3.

在总结:在默认情况下,你的程序是最兼容。使用LOCAL_ARM_MODE可能因使用ARM指令使事情稍微快一点的电池寿命为代价 - 这是为兼容作为默认设置。通过添加 APP_ABI:= armeabi armeabi-V7A 行,你将有较新的设备改进的性能,保持与旧的兼容,但你的.apk文件将更大(由于具有2库)。为了使用NEON指令,你需要编写特殊的code,可检测在运行时的CPU的能力,而这仅适用于可运行armeabi-V7A新的设备。

In summary: by default, your programs are the most compatible. Using LOCAL_ARM_MODE may make things slightly faster at the expense of battery life due to the use of ARM instructions - and it is as compatible as the default set-up. By adding the APP_ABI := armeabi armeabi-v7a line you will have improved performance on newer devices, remain compatible with older ones, but your .apk file will be larger (due to having 2 libraries). In order to use NEON instructions, you will need to write special code that detects the capabilities of the CPU at run time, and this only applies to newer devices that can run armeabi-v7a.

这篇关于生产优化NDK $ C $下多种架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆