ARM程序集:商店中的自动增量寄存器 [英] ARM assembly: auto-increment register on store

查看:155
本文介绍了ARM程序集:商店中的自动增量寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用[Rn]!自动增加STR上的寄存器的基地址?我浏览了整个文档,但找不到确切的答案,主要是因为同时为LDR和STR提供了命令语法-从理论上讲,它应该对LDR和STR都适用,但是我找不到auto的任何示例-在商店中递增(加载正常).

Is it possible to auto-increment the base address of a register on a STR with a [Rn]!? I've peered through the documentation but haven't been able to find a definitive answer, mainly because the command syntax is presented for both LDR and STR - in theory it should work for both, but I couldn't find any examples of auto-incrementing on a store (the loading works ok).

我制作了一个小程序,将两个数字存储在向量中.完成后,out的内容应为{1, 2},但存储区将覆盖第一个字节,就像自动增量无法正常工作一样.

I've made a small program which stores two numbers in a vector. When it's done the contents of out should be {1, 2} but the store overwrites the first byte, as if the auto-increment isn't working.

#include <stdio.h>

int main()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0]!       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}


尽管答案是正确的常规加载和存储,但我发现优化器弄乱了诸如vldm/vstm之类的矢量指令的自动增量.例如,以下程序


While the answer was right for regular loads and stores, I found that the optimizer messes up auto-increment on vector instructions such as vldm/vstm. For instance, the following program

#include <stdio.h>

int main()
{
        volatile int *in = new int[16];
        volatile int *out = new int[16];

        for (int i=0;i<16;i++) in[i] = i;

        asm volatile (
        "vldm   %0!, {d0-d3}            \n\t"
        "vldm   %0,  {d4-d7}            \n\t"
        "vstm   %1!, {d0-d3}            \n\t"
        "vstm   %1,  {d4-d7}            \n\t"
        :: "r"(in), "r"(out)
        : "memory" );

        for (int i=0;i<16;i++) printf("%d\n", out[i]);
        return 0;
}

编译为

g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main

将在最后8个变量的输出上产生乱码,因为优化器将保留递增的变量并将其用于printf.换句话说,out[i]实际上是out[i+8],因此前8个打印值是矢量中的后8个值,其余是超出范围的存储位置.

will produce gibberish on the output of the last 8 variables, because the optimizer is keeping the incremented variable and using it for the printf. In other words, out[i] is actually out[i+8], so the first 8 printed values are the last 8 from the vector and the rest are memory locations out of bounds.

我在整个代码中尝试使用volatile关键字的不同组合,但是只有当我使用-O0标志进行编译或者使用易失性矢量而不是指针和new时,行为才会改变,例如

I've tried with different combinations of the volatile keyword throughout the code, but the behavior changes only if I compile with the -O0 flag or if I use a volatile vector instead of a pointer and new, like

volatile int out[16];

推荐答案

要存储和加载,请执行以下操作:

For store and load you do this:

ldr r0,[r1],#4
str r0,[r2],#4

在该地址用作寄存器之后,但在指令完成之前,无论最后输入什么,在这种情况下,在末尾加4都将添加到基址寄存器(在ldr示例中为r1,在str示例中为r2).非常像

whatever you put at the end, 4 in this case, is added to the base register (r1 in the ldr example and r2 in the str example) after the register is used for the address but before the instruction has completed it is very much like

unsigned int a,*b,*c;
...
a = *b++;
*c++ = a;

编辑,您需要查看反汇编以查看发生了什么(如果有的话).我正在使用最新的代码源,或者现在仅使用来自指导者图形工具链的Sourcery精简版.

EDIT, you need to look at the disassembly to see what is going on, if anything. I am using the latest code sourcery or now just sourcery lite from mentor graphics toolchain.

arm-none-linux-gnueabi-gcc(Sourcery CodeBench Lite 2011.09-70)4.6.1

arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1

#include <stdio.h>
int main ()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0], #4       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}


arm-none-linux-gnueabi-gcc str.c -O2  -o str.elf

arm-none-linux-gnueabi-objdump -D str.elf > str.list


00008380 <main>:
    8380:   e92d4010    push    {r4, lr}
    8384:   e3a04000    mov r4, #0
    8388:   e24dd008    sub sp, sp, #8
    838c:   e58d4000    str r4, [sp]
    8390:   e58d4004    str r4, [sp, #4]
    8394:   e1a0300d    mov r3, sp
    8398:   e3a00001    mov r0, #1
    839c:   e4830004    str r0, [r3], #4
    83a0:   e2800001    add r0, r0, #1
    83a4:   e5830000    str r0, [r3]
    83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
    83ac:   e1a01004    mov r1, r4
    83b0:   e1a02004    mov r2, r4
    83b4:   ebffffe5    bl  8350 <_init+0x20>
    83b8:   e1a00004    mov r0, r4
    83bc:   e28dd008    add sp, sp, #8
    83c0:   e8bd8010    pop {r4, pc}
    83c4:   0000854c    andeq   r8, r0, ip, asr #10

所以

sub sp, sp, #8

是分配两个本地整数out [0]和out [1]

is to allocate the two local ints out[0] and out[1]

mov r4,#0
str r4,[sp]
str r4,[sp,#4]

是因为它们被初始化为零,然后才是内联程序集

is because they are initialized to zero, then comes the inline assembly

8398:   e3a00001    mov r0, #1
839c:   e4830004    str r0, [r3], #4
83a0:   e2800001    add r0, r0, #1
83a4:   e5830000    str r0, [r3]

然后是printf:

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e1a01004    mov r1, r4
83b0:   e1a02004    mov r2, r4
83b4:   ebffffe5    bl  8350 <_init+0x20>

,现在很清楚为什么它不起作用.您并没有宣布自己是不稳定的.您没有给代码任何理由返回ram以获取printf的out [0]和out [1]的值,编译器知道r4既包含out [0]和out [1]的值,该函数中的代码太少,以至于不必逐出r4并重新使用它,因此将r4用于printf.

and now it is clear why it didnt work. you are didnt declare out as volatile. You gave the code no reason to go back to ram to get the values of out[0] and out[1] for the printf, the compiler knows that r4 contains the value for both out[0] and out[1], there is so little code in this function that it didnt have to evict r4 and reuse it so it used r4 for the printf.

如果您将其更改为易失性

If you change it to be volatile

    volatile int out[]={0, 0};

然后您将获得所需的结果:

Then you should get the desired result:

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e59d1000    ldr r1, [sp]
83b0:   e59d2004    ldr r2, [sp, #4]
83b4:   ebffffe5    bl  8350 <_init+0x20>

printf的准备工作是从ram读取的.

the preparation for printf reads from ram.

这篇关于ARM程序集:商店中的自动增量寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆