ARM 程序集:重新加载“asm"时在“GENERAL_REGS"类中找不到寄存器 [英] ARM assembly: can’t find a register in class ‘GENERAL_REGS’ while reloading ‘asm’

查看:29
本文介绍了ARM 程序集:重新加载“asm"时在“GENERAL_REGS"类中找不到寄存器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 ARM Cortex-a8 上的 ARM 程序集中实现一个将 32 位操作数与 256 位操作数相乘的函数.问题是我的寄存器用完了,我不知道如何减少这里使用的寄存器的数量.这是我的功能:

I am trying to implement a function which multiplies 32-bit operand with 256-bit operand in ARM assembly on ARM Cortex-a8. The problem is I am running out of registers and I have no idea how I can reduce the number of used registers here. Here is my function:

typedef struct UN_256fe{

uint32_t uint32[8];

}UN_256fe;

typedef struct UN_288bite{

uint32_t uint32[9];

}UN_288bite;
void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res){

asm (

        "umull          r3, r4, %9, %10;
	"
        "mov            %0, r3;         
	"/*res->uint32[0] = r3*/
        "umull          r3, r5, %9, %11;
	"
        "adds           r6, r3, r4;     
	"/*res->uint32[1] = r3 + r4*/
        "mov            %1, r6;         
	"
        "umull          r3, r4, %9, %12;
	"
        "adcs           r6, r5, r3;     
	"
        "mov            %2, r6;         
	"/*res->uint32[2] = r6*/
        "umull          r3, r5, %9, %13;
	"
        "adcs           r6, r3, r4;     
	"
        "mov            %3, r6;         
	"/*res->uint32[3] = r6*/
        "umull          r3, r4, %9, %14;
	"
        "adcs           r6, r3, r5;     
	"
        "mov            %4, r6;         
	"/*res->uint32[4] = r6*/
        "umull          r3, r5, %9, %15;
	"
        "adcs           r6, r3, r4;     
	"
        "mov            %5, r6;         
	"/*res->uint32[5] = r6*/
        "umull          r3, r4, %9, %16;
	"
        "adcs           r6, r3, r5;     
	"
        "mov            %6, r6;         
	"/*res->uint32[6] = r6*/
        "umull          r3, r5, %9, %17;
	"
        "adcs           r6, r3, r4;     
	"
        "mov            %7, r6;         
	"/*res->uint32[7] = r6*/
        "adc            r6, r5, #0 ;    
	"
        "mov            %8, r6;         
	"/*res->uint32[8] = r6*/

        : "=r"(res->uint32[8]), "=r"(res->uint32[7]), "=r"(res->uint32[6]), "=r"(res->uint32[5]), "=r"(res->uint32[4]),
           "=r"(res->uint32[3]), "=r"(res->uint32[2]), "=r"(res->uint32[1]), "=r"(res->uint32[0])
         : "r"(A), "r"(B->uint32[7]), "r"(B->uint32[6]), "r"(B->uint32[5]),
           "r"(B->uint32[4]), "r"(B->uint32[3]), "r"(B->uint32[2]), "r"(B->uint32[1]), "r"(B->uint32[0]), "r"(temp)
         : "r3", "r4", "r5", "r6", "cc", "memory");

}

EDIT-1:我根据第一条评论更新了我的clobber列表,但我仍然收到同样的错误

EDIT-1: I updated my clobber list based on the first comment, but I still get the same error

推荐答案

一个简单的解决方案是打破这个,不要使用'clobber'.将变量声明为 'tmp1' 等.尽量不要使用任何 mov 语句;如果必须,让编译器执行此操作.编译器将使用一种算法来找出最佳的信息流".如果你使用'clobber',它不能重用寄存器.他们现在的样子,你让它在汇编程序执行之前先加载所有内存.这很糟糕,因为您希望内存/CPU ALU 流水线化.

A simple solution is to break this up and don't use 'clobber'. Declare the variables as 'tmp1', etc. Try not to use any mov statements; let the compiler do this if it has to. The compiler will use an algorithm to figure out the best 'flow' of information. If you use 'clobber', it can not reuse registers. They way it is now, you make it load all the memory first before the assembler executes. This is bad as you want memory/CPU ALU to pipeline.

void multiply32x256(uint32_t A, UN_256fe* B, UN_288bite* res) 
{

  uint32_t mulhi1, mullo1;
  uint32_t mulhi2, mullo2;
  uint32_t tmp;

  asm("umull          %0, %1, %2, %3;
	"
       : "=r" (mullo1), "=r" (mulhi1)
       : "r"(A), "r"(B->uint32[7])
  );
  res->uint32[8] = mullo1; /* was 'mov %0, r3; */
  volatile asm("umull          %0, %1, %3, %4;
	"
      "adds           %2, %5, %6;     
	"/*res->uint32[1] = r3 + r4*/
     : "=r" (mullo2), "=r" (mulhi2), "=r" (tmp)
     : "r"(A), "r"(B->uint32[6]), "r" (mullo1), "r"(mulhi1)
     : "cc"
    );
  res->uint32[7] = tmp; /* was 'mov %1, r6; */
  /* ... etc */
}

gcc inline assembler"的全部目的不是直接在C"文件中编写汇编程序.就是利用编译器的寄存器分配逻辑AND做一些在'C'中不容易做到的事情.在您的情况下使用进位逻辑.

The whole purpose of the 'gcc inline assembler' is not to code assembler directly in a 'C' file. It is to use the register allocation logic of the compiler AND do something that can not be easily done in 'C'. The use of carry logic in your case.

通过不让它成为一个巨大的asm"子句,编译器可以在需要新寄存器时从内存中调度加载.它还将使用加载/存储单元处理您的UMULL"ALU 活动.

By not making it one huge 'asm' clause, the compiler can schedule the loads from memory as it needs new registers. It will also pipeline your 'UMULL' ALU activity with the load/store unit.

只有在指令隐式破坏特定寄存器时才应使用 clobber.你也可以使用类似的东西,

You should only use clobber if an instruction implicitly clobbers a specific register. You may also use something like,

register int *p1 asm ("r0");

并将其用作输出.但是,我不知道任何这样的 ARM 指令,除了那些可能会改变堆栈并且您的代码不使用这些指令和进位的指令.

and use that as an output. However, I don't know of any ARM instructions like this besides those that might alter the stack and your code doesn't use these and the carry of course.

GCC 知道,如果将内存列为输入/输出,则内存会发生变化,因此您不需要 memory clobber.事实上,它是有害的,因为 memory clobber 是 编译器内存屏障 这将导致当编译器可以为后者安排内存时写入内存.

GCC knows that memory changes if it is listed as an input/output, so you don't need a memory clobber. In fact it is detrimental as the memory clobber is a compiler memory barrier and this will cause memory to be written when the compiler might be able to schedule that for latter.

道德是使用 gcc 内联汇编器与编译器一起工作.如果您在汇编程序中编码并且您有大量例程,那么寄存器的使用可能会变得复杂和混乱.典型的汇编程序编码器在每个例程中只会在一个寄存器中保存一件事,但这并不总是对寄存器的最佳使用.当代码变大时,编译器将以一种难以击败的相当智能的方式对数据进行混洗(并且对于手工代码 IMO 不太满意).

The moral is use gcc inline assembler to work with the compiler. If you code in assembler and you have huge routines, the register use can become complex and confusing. Typical assembler coders will keep only one thing in a register per routine, but that is not always the best use of registers. The compiler will shuffle the data around in a fairly smart way that is difficult to beat (and not very satisfying to hand code IMO) when the code size gets larger.

您可能想查看 GMP 库,它有很多有效的方法解决您的代码似乎存在的一些相同问题.

You might want to look at the GMP library which has lots of ways to efficiently tackle some of the same issues it looks like your code has.

这篇关于ARM 程序集:重新加载“asm"时在“GENERAL_REGS"类中找不到寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆