GCC 发出 ARM idiv 指令(续) [英] GCC to emit ARM idiv instructions (continued)
问题描述
我想知道这是否适用于 Krait 400 CPU.我遵循了一些建议这里
I am wondering if this is possible for a Krait 400 CPU. I followed some of the suggestions here
当我使用 mcpu=cortexa15 进行编译时,代码会编译并有效地在程序集转储中看到 udiv 指令.
When I compile with mcpu=cortexa15 , then the code compiles and effectively I see udiv instructions in the assembly dump.
不过,我想知道:
- 是否有可能让它与 March=armv7-a 一起工作?(未指定 cpu;这是我最初拥有的方式)
- 我尝试使用 mcpu=krait2,但由于我没有使用 snapdragon llvm(我还不知道这需要多少努力),所以它无法识别它.是否可以从 llvm 获取 cpu 定义并以某种方式使其可用于我的编译器?
- 任何其他方法/补丁/技巧?
我的编译器选项如下:
/development/android-ndk-r8e/toolchains/arm-linux-androideabi-4.7/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gcc -DANDROID -DNEON -fexceptions -Wno-psabi --sysroot=/development/android-ndk-r8e/platforms/android-14/arch-arm -fpic -funwind-tables -funswitch-loops -finline-limit=300 -fsigned-char -no-canonical-prefixes -march=armv7-a -mfloat-abi=softfp -mfpu=neon -fdata-sections -ffunction-sections -Wa,--noexecstack -marm -fomit-frame-pointer -fstrict-aliasing -O3 -DNDEBUG
我得到的错误是:
Error: selected processor does not support ARM mode `udiv r1,r1,r3'
顺便说一句,我不得不说我才刚刚开始了解整个计划,因此我想逐步了解我在做什么.
As a side note I have to say that I am just beginning o understand the whole scheme, therefore I want to keep it in small steps to understand what I am doing.
提前致谢.
编辑 1:
我尝试编译一个单独的模块,只包含 udiv 指令.该模块使用 -mcpu=cortex-a15 参数编译,而应用程序的其余部分使用 -march=armv7-a 参数编译.结果是(以某种方式预期)函数调用开销影响了应用程序的时间性能.我无法获得内联代码,因为尝试进入内联会导致与我最初遇到的错误相同的错误.在尝试重新发明轮子之前,我将切换到 Snapdragon 以查看是否有更好的性能.感谢大家的回答和提示.
I tried compiling a separate module only including the udiv instruction. That module is compiled using the -mcpu=cortex-a15 arameter, while the rest of the application is compiled using the -march=armv7-a parameter. The result was (somehow expected) that the function call overhead affected the time performance of the application. I could not get inline code since tring to get in inline resulted in the same error that I originally had. I will switch to the the Snapdragon to see if there is a better performance before trying to reinvent the wheel. Thanks everybody for their answers and tips.
推荐答案
idiv
- 一个表示同时支持 sdiv
和 udiv
的混合体是一个可选的 Cortex-A 指令.Cortex-A 的支持可以通过 ID_ISAR0
cp15 寄存器查询,以位 [27:24] 为单位.
idiv
- an amalgam to mean both sdiv
and udiv
is supported is an optional Cortex-A instruction. The support by a Cortex-A can be queried via the ID_ISAR0
cp15 registers, in bits [27:24].
/* Get idiv support. */
unsigned int ISAR0;
int idiv;
__asm ("mrc 15, 0, %0, c0, c2, 0" :"=r" (ISAR0));
#ifdef __thumb2__
idiv = (ISAR0 & 0xf000000UL) ? 1 : 0;
#else
idiv = (ISAR0 & 0xf000000UL) == 0x2000000UL ? 1 : 0;
#endif
位[27:24]是0001
,如果只有thumb2支持udiv
和sdiv
指令.如果位 [27:24] 是 0010
,则两种模式都支持指令.
Bits [27:24] are 0001
, if only thumb2 supports the udiv
and sdiv
instructions. If the bits [27:24] are 0010
, then both modes support the instructions.
由于 gcc 标志 -march=armv7-a
等意味着代码应该在 ALL 这种类型的 CPU 上工作,并且这条指令是可选的,它会是gcc 发出此指令的错误.
As the gcc flags -march=armv7-a
, etc mean that the code should work on ALL CPUs of this type and this instruction is optional, it would be an error for gcc to emit this instruction.
您可以使用不同的标志编译不同的模块,例如,
You may compile different modules with different flags such as,
gcc -march=armv7-a -o general.o -c general.c
gcc -mcpu=cortex-a15 -D_USE_IDIV_=1 -o fast_idiv.o -c fast_div.c
这些模块可以链接在一起,上面的代码可用于在运行时选择合适的例程.例如,两个文件可能都有,
These modules can be linked together and the above code can be used to select at run time an appropriate routine. For example, both files may have,
#include "fir_template.def"
这个文件可能有,
#ifdef _USE_IDIV_
#define _FUNC(x) idiv_ ## x
#else
#define _FUNC(x) x
#endif
int _FUNC(fir8)(FILTER8 *filter, SAMPLE *data,)
{
....
}
如果您知道您的代码只能在 Cortex-a15 上运行,请使用 -mcpu
选项.如果您希望它运行得更快如果它可以并且是通用的(支持所有 armv7-a CPU),那么您必须按照上述方法识别 CPU 并动态选择代码.
If you know your code will only run on a Cortex-a15, then use the -mcpu
option. If you want this to run faster IF it can and be generic (support all armv7-a CPUs), then you must ID the CPU as outlined above and dynamically select the code.
附录:上述文件(general.c 和 fast_idiv.c)可以放在具有相同 API 的不同共享库中.然后查询/proc/cpuinfo
,看是否支持idiv
.将 LD_LIBRARY_PATH
(或 dlopen()
)设置为适当的版本.选择将取决于所涉及的代码量.
Addendum: The files above (general.c and fast_idiv.c) could be put in separate shared libraries with the same API. Then interrogate /proc/cpuinfo
and see if idiv
is supported. Set the LD_LIBRARY_PATH
(or dlopen()
) to the appropriate version. The choice will depend on how much code is involved.
这篇关于GCC 发出 ARM idiv 指令(续)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!