为什么GCC产生FXXX,而不是VXXX浮点汇编指令用于Cortex-A9? [英] Why does GCC generates Fxxx instead of Vxxx floating point assembly instructions for Cortex-A9?

查看:1997
本文介绍了为什么GCC产生FXXX,而不是VXXX浮点汇编指令用于Cortex-A9?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成浮点code代表的ARM Cortex-A9。我调查code之间的性能差异与为code只为​​VFPv3的协处理器产生的NEON协处理器产生。我开始用下面简单的测试程序:

 的#define A大小4漂浮A [A大小] = {7.0f,2.0F,3.0F,4.0F};
浮B〔A大小] = {5.0F,6.0f,7.0f,8.0f};
浮C [A大小]诠释主要(无效){
无符号整型我;
对于(i = 0; I< A大小,我++)
{
    C [i] = A [I] + B [I]
}
返回0;
}

当我用下面的标志编译

  CCFLAGS = -g -c -O3 -mcpu =的cortex-A9 -mfpu =霓虹灯-mfloat-ABI = softfp -ffast-数学-funsafe-数学优化

我从不是GCC或以下汇编输出code的Sourcery精简版的编译器:

  9:atest.c **** INT主要(无效){
23的.loc 1 9 0
24 .cfi_startproc
25 @ ARGS = 0,pretend = 0,帧= 0
26 @ frame_needed = 0,uses_anonymous_args = 0
27 @链接寄存器保存淘汰。
10:atest.c ****
11:atest.c ****无符号整型我;
12:atest.c ****
13:atest.c ****为(i = 0; I< A大小,我++)
14:atest.c **** {
15:atest.c **** C [i] = A [I] + B [I]
28的.loc 1月15日0
29 0000 003000E3 MOVW R3,#:lower16:.LANCHOR0
30 0004 002000E3 MOVW R2,#:lower16:C
31 0008 003040E3 MOVT R3,#:upper16:.LANCHOR0
32 000C DF2A63F4 vld1.64 {D18-D19},[R3:64]
33 0010 040BD3ED VLDR D16,[R3,#16]
34 0014 061BD3ED VLDR D17,[R3,#24]
35 0018 E00D42F2 vadd.f32 Q8,Q9,Q8
36 001C 002040E3 MOVT R2,#:upper16:C
16:atest.c ****}
17:atest.c ****
18:atest.c ****返回0;
19:atest.c ****}

这是我所期待看到的。浮点指令是在VXXX的形式。

现在当我改变了编译器标志-mfpu = VFPv3的(或任何其他排列如-mfpu =的VFPv3-D16-F16)我看到以下内容:

  9:atest.c **** INT主要(无效){
23的.loc 1 9 0
24 .cfi_startproc
25 @ ARGS = 0,pretend = 0,帧= 0
26 @ frame_needed = 0,uses_anonymous_args = 0
27 @链接寄存器保存淘汰。
28 .LVL0:
11:atest.c ****无符号整型我;
13:atest.c ****为(i = 0; I< A大小,我++)
14:atest.c **** {
15:atest.c **** C [i] = A [I] + B [I]
29的.loc 1月15日0
30 0000 003000E3 MOVW R3,#:lower16:.LANCHOR0
31 0004 002000E3 MOVW R2,#:lower16:C
32 0008 003040E3 MOVT R3,#:upper16:.LANCHOR0
33 000C 002040E3 MOVT R2,#:upper16:C
34 0010 004A93ED FLDS S8,[R3]
16:atest.c ****}
18:atest.c ****返回0;
19:atest.c ****}
35的.loc 1 19 0
36 0014 0000A0E3 MOV R0,#0
15:atest.c ****}
37的.loc 1月15日0
38 0018 046A93ED FLDS S12,[R3,#16]
39 001C 014AD3ED FLDS S9,[R 3,#4]
40 0020 056AD3ED FLDS S13,[R3,#20]
41 0024 025A93ED FLDS S10,[R3,#8]
42 0028 067A93ED FLDS S14,[R3,#24]
43 002C 035AD3ED FLDS S11,[R3,#12]
44 0030 077AD3ED FLDS S15,[R3,#28]
45 0034 066A34EE fadds S12,S8,S12
46 0038 A66A74EE fadds S13,S9,S13
47 003C 077A35EE fadds S14,S10,S14
48 0040 A77A75EE fadds S15,S11,S15
49 0044 006A82ED FSTS S12,[R2]
50 .LVL1:
51 0048 016AC2ED FSTS S13,[R2,#4]
52 .LVL2:
53 004C 027A82ED FSTS S14,[R2,#8]
54 .LVL3:
55 0050 037AC2ED FSTS S15,[R2,#12]
56 .LVL4:
57的.loc 1 19 0
58 0054 1EFF2FE1 BX LR
59 .cfi_endproc
60 .LFE0:
61 .fnend

所有浮点汇编指令的形式为FXXX。他们为什么不其形式为VXXX?我期待看到一个看起来像VLD1.32加载指令,并添加看起来像VADD.F32说明。当我搜索的ARM官方文档中的指令FLDS它说,FLDS是在ARM9架构使用,不的Cortex-A9。

我都试过-mcpu,-mfpu,-march编译器标记的每一个组合,但我似乎无法产生浮于形式VXXX点汇编指令使用或者用于Linux的GCC编译器或$ C $ ç巫术精简版的编译器的Linux版本。我在做什么错了?


解决方案

  

我在做什么错了?


绝对没有,除非你使用的是旧的反汇编。 的说明是相同的,所述编码是相同的,这是只是改变了推荐的组装助记符。显然,无论反汇编你使用(我不承认,输出格式),因为ARM推出了UAL语法一直没有更新,所以一直拆卸旧助记符。随意尝试另一个反汇编器(例如最近十岁上下的 objdump的)比较,但我说这是纯粹重presentation区别 - 没有什么可担心的。

I'm trying to generate floating point code for the ARM Cortex-A9. I am investigating the performance difference between code generated for the NEON coprocessor versus code generated only for the VFPV3 coprocessor. I started with the following simple test program:

#define ASIZE 4

float   A[ASIZE] = {7.0f, 2.0f, 3.0f, 4.0f};
float   B[ASIZE] = {5.0f, 6.0f, 7.0f, 8.0f};
float   C[ASIZE];

int main(void) {
unsigned int i;
for (i=0; i<ASIZE; i++)
{
    C[i] = A[i] + B[i];
}
return 0;
}

When I compile it with the following flags

CCFLAGS = -g -c -O3 -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ffast-math -funsafe-math-optimizations 

I get the following assembly output from either GCC or Code Sourcery Lite compilers:

 9:atest.c       **** int main(void) {
23                      .loc 1 9 0
24                      .cfi_startproc
25                      @ args = 0, pretend = 0, frame = 0
26                      @ frame_needed = 0, uses_anonymous_args = 0
27                      @ link register save eliminated.
10:atest.c       **** 
11:atest.c       ****   unsigned int i;
12:atest.c       **** 
13:atest.c       ****   for (i=0; i<ASIZE; i++)
14:atest.c       ****   {
15:atest.c       ****       C[i] = A[i] + B[i];
28                      .loc 1 15 0
29 0000 003000E3        movw    r3, #:lower16:.LANCHOR0
30 0004 002000E3        movw    r2, #:lower16:C
31 0008 003040E3        movt    r3, #:upper16:.LANCHOR0
32 000c DF2A63F4        vld1.64 {d18-d19}, [r3:64]
33 0010 040BD3ED        vldr    d16, [r3, #16]
34 0014 061BD3ED        vldr    d17, [r3, #24]
35 0018 E00D42F2        vadd.f32    q8, q9, q8
36 001c 002040E3        movt    r2, #:upper16:C
16:atest.c       ****   }
17:atest.c       **** 
18:atest.c       ****   return 0;
19:atest.c       **** }

This is what I expected to see. The float point instructions are in the form of "Vxxx".

Now when I change the compiler flag to -mfpu=vfpv3 (or any other permutation such as -mfpu=vfpv3-d16-f16) I see the following:

 9:atest.c       **** int main(void) {
23                      .loc 1 9 0
24                      .cfi_startproc
25                      @ args = 0, pretend = 0, frame = 0
26                      @ frame_needed = 0, uses_anonymous_args = 0
27                      @ link register save eliminated.
28                  .LVL0:
11:atest.c       ****   unsigned int i;
13:atest.c       ****   for (i=0; i<ASIZE; i++)
14:atest.c       ****   {
15:atest.c       ****       C[i] = A[i] + B[i];
29                      .loc 1 15 0
30 0000 003000E3        movw    r3, #:lower16:.LANCHOR0
31 0004 002000E3        movw    r2, #:lower16:C
32 0008 003040E3        movt    r3, #:upper16:.LANCHOR0
33 000c 002040E3        movt    r2, #:upper16:C
34 0010 004A93ED        flds    s8, [r3]
16:atest.c       ****   }
18:atest.c       ****   return 0;
19:atest.c       **** }
35                      .loc 1 19 0
36 0014 0000A0E3        mov r0, #0
15:atest.c       ****   }
37                      .loc 1 15 0
38 0018 046A93ED        flds    s12, [r3, #16]
39 001c 014AD3ED        flds    s9, [r3, #4]
40 0020 056AD3ED        flds    s13, [r3, #20]
41 0024 025A93ED        flds    s10, [r3, #8]
42 0028 067A93ED        flds    s14, [r3, #24]
43 002c 035AD3ED        flds    s11, [r3, #12]
44 0030 077AD3ED        flds    s15, [r3, #28]
45 0034 066A34EE        fadds   s12, s8, s12
46 0038 A66A74EE        fadds   s13, s9, s13
47 003c 077A35EE        fadds   s14, s10, s14
48 0040 A77A75EE        fadds   s15, s11, s15
49 0044 006A82ED        fsts    s12, [r2]
50                  .LVL1:
51 0048 016AC2ED        fsts    s13, [r2, #4]
52                  .LVL2:
53 004c 027A82ED        fsts    s14, [r2, #8]
54                  .LVL3:
55 0050 037AC2ED        fsts    s15, [r2, #12]
56                  .LVL4:
57                      .loc 1 19 0
58 0054 1EFF2FE1        bx  lr
59                      .cfi_endproc
60                  .LFE0:
61                      .fnend

All the floating point assembly instructions are in the form "Fxxx". Why aren't they in the form "Vxxx"? I was expecting to see load instructions that looked like VLD1.32 and add instructions that looked like VADD.F32. When I searched for the instruction "flds" in the official ARM documentation it says that "flds" was used on the ARM9 architecture, not Cortex-A9.

I have tried every combination of -mcpu, -mfpu, -march compiler flags, but I can't seem to generate floating point assembly instructions in the form "Vxxx" using either the GCC compiler for Linux or the Code Sorcery Lite compiler for Linux. What am I doing wrong?

解决方案

What am I doing wrong?

Absolutely nothing, unless you count using an old disassembler. The instructions are the same, the encodings are the same, it's just the recommended assembly mnemonics that changed. Clearly whatever disassembler you're using (I don't recognise that output format) hasn't been updated since ARM introduced the UAL syntax, so has disassembled to the old mnemonics. Feel free to try another disassembler (e.g. a recent-ish objdump) to compare, but as I say it's purely a difference in representation - nothing to worry about.

这篇关于为什么GCC产生FXXX,而不是VXXX浮点汇编指令用于Cortex-A9?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆