仅通过将所有寄存器名称从 eXX 更改为 rXX 来从 32 位移植到 64 位会使阶乘返回 0? [英] Porting from 32 to 64-bit by just changing all the register names from eXX to rXX makes factorial return 0?

查看:16
本文介绍了仅通过将所有寄存器名称从 eXX 更改为 rXX 来从 32 位移植到 64 位会使阶乘返回 0?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于所有学习计算机编程艺术的人来说,能够访问诸如 Stack Overflow 之类的社区是多么幸运!我已决定承担学习如何为计算机编程的任务,我这样做是通过一本名为从头开始编程"的电子书的知识,该书教读者如何用汇编语言创建程序在 GNU/Linux 环境中.

我在这本书中的进展已经到了创建一个程序来计算整数 4 的阶乘的地步,我已经完成并没有任何由 GCC 的汇编程序或运行程序引起的错误.但是,我程序中的函数没有返回正确的答案!4 的阶乘是 24,但程序返回值 0!说得对,我不知道这是为什么!

这是供您考虑的代码:

.section .data.section .text.globl _start.globl 阶乘_开始:push $4 #这是函数参数call factorial #函数被调用add $4, %rsp #堆栈恢复到原来的状态#调用函数前的状态mov %rax, %rbx #这条指令将移动结果#由函数计算到rbx#注册并将作为回报#价值mov $1, %rax #1 必须放在这个寄存器中#退出系统调用int $0x80 #退出中断.type factorial, @function #将下面的代码定义为一个函数阶乘:#函数标签push %rbp #保存基指针mov %rsp, %rbp #将堆栈指针移入基址-#pointer register 以便堆栈中的数据#可以作为基数的索引来引用-#指针mov $1, %rax #rax 寄存器将包含产品#阶乘的mov 8(%rbp), %rcx #将函数参数移入%rcxstart_loop: #进程循环开始cmp $1, %rcx #这是循环的退出条件je loop_exit #如果%rcx中的值达到1,退出循环imul %rcx, %rax #乘以当前整数#factorial 存储在 %rax 中的值dec %rcx #将阶乘整数减1jmp start_loop #无条件跳转到循环开始处loop_exit: #循环退出开始mov %rbp, %rsp #恢复堆栈指针pop %rbp #从堆栈中删除保存的基指针ret #return

解决方案

TL:DR: the factorial of the return address overflowed %rax, left 0, 因为你移植错了.

<小时>

将 32 位代码移植到 64 位并不像更改所有寄存器名称那么简单.这可能会使它组装起来,但正如您发现即使是这个简单程序的行为也有所不同.在 x86-64 中,push %regcall 都推送 64 位值,并将 rsp 修改为 8.使用调试器单步执行您的代码.(有关将 gdb 用于 asm 的信息,请参阅 x86 标签 wiki 的底部.)

您正在阅读一本使用 32 位示例的书,因此您可能应该只需 将它们构建为 32 位可执行文件,而不是在不知如何操作之前尝试将它们移植到 64 位.

<小时>

您的 sys_exit() 使用 32 位 int 0x80 ABI 仍然有效(如果在 64 位代码中使用 32 位 int 0x80 Linux ABI 会发生什么?),但是如果您尝试传递 64 位指针,则会遇到系统调用的麻烦.使用64 位 ABI.

如果要调用任何库函数,也会遇到问题,因为标准的函数调用约定也不同.请参阅 为什么存储参数在寄存器中而不是在 x86-64 程序集的堆栈中?、64 位 ABI 链接和 标记维基.

<小时>

但是你没有做任何这些,所以你的程序的问题简单地归结为没有考虑 x86-64 中加倍的堆栈宽度".您的factorial 函数读取返回地址作为其参数.

这是您的代码,注释以解释它的实际作用

push $4 # rsp-=8.(rsp) = qword 4# 非标准调用约定,在堆栈上带有 args.调用阶乘 # rsp-=8.(rsp) = 返回地址.RIP=阶乘add $4, %rsp # 未对齐堆栈,因此它指向您之前推送的 4 的上半部分.# 如果这是在一个想要返回的函数中,你会被搞砸的.mov %rax, %rbx # 复制返回值到系统调用的第一个参数mov $1, %rax #eax = __NR_EXIT from asm/unistd_32.h,与 mov $1, %eax 相比浪费了 2 个字节int $0x80 # 32 位 ABI 系统调用,eax=调用号,ebx=first arg.sys_exit(阶乘(4))

所以调用者很好(对于您发明的非标准 64 位调用约定,它传递堆栈上的所有参数).您也可以完全省略 add%rsp ,因为您将要退出而不进一步接触堆栈.

.type factorial, @function #将下面的代码定义为函数阶乘:#函数标签推 %rbp #rsp-=8, (rsp) = rbpmov %rsp, %rbp # 制作一个传统的栈帧mov $1, %rax #retval = 1.(浪费 2 个字节 vs. 完全等效的 mov $1, %eax)mov 8(%rbp), %rcx #加载返回地址到%rcx...并计算阶乘

对于静态可执行文件(和动态链接的可执行文件未通过 PIE 启用 ASLR),_start 通常位于 0x4000c0.您的程序仍然可以在现代 CPU 上几乎立即运行,因为 0x4000c0 * imul 的 3c 延迟仍然只有 1250 万个核心时钟周期.在 4GHz CPU 上,这是 3 毫秒的 CPU 时间.

如果您在最近的发行版中通过与 gcc foo.o 链接来制作位置无关的可执行文件,_start 的地址将类似于 0x5555555545a0,您的函数在 4GHz CPU 上运行需要大约 70368 秒,具有 3 周期模拟延迟.

4194496!包括很多个偶数,所以它的二进制表示有很多个尾随零.当您完成乘以从 0x4000c0 到 1 的每个数字时,整个 %rax 将为零.

Linux 进程的退出状态只是您传递给 sys_exit() 的整数的低 8 位(因为 wstatus 只是一个 32 位整数并包括其他内容,例如结束进程的信号.参见 wait4(2)).所以即使是小参数,也不需要太多.

How fortunate it is for all of use learning the art of computer programming to have access to a community such as Stack Overflow! I have made the decision to take up the task of learning how to program computers and I am doing so by the knowledge of an e-book called 'Programming From the Ground Up', which teaches the reader how to create programs in the assembly language within the GNU/Linux environment.

My progress in the book has come to the point of creating a program which computes the factorial of the integer 4 with a function, which I have made and done without any error caused by the assembler of GCC or caused by running the program. However, the function in my program does not return the right answer! The factorial of 4 is 24, but the program returns a value of 0! Rightly speaking, I do not know why this is!

Here is the code for your consideration:

.section .data

.section .text

.globl _start

.globl factorial

_start:

push $4                    #this is the function argument
call factorial             #the function is called
add $4, %rsp               #the stack is restored to its original 
                           #state before the function was called
mov %rax, %rbx             #this instruction will move the result 
                           #computed by the function into the rbx 
                           #register and will serve as the return 
                           #value 
mov $1, %rax               #1 must be placed inside this register for 
                           #the exit system call
int $0x80                  #exit interrupt

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #saves the base-pointer
mov %rsp, %rbp             #moves the stack-pointer into the base-
                           #pointer register so that data in the stack 
                           #can be referenced as indexes of the base-
                           #pointer
mov $1, %rax               #the rax register will contain the product 
                           #of the factorial
mov 8(%rbp), %rcx          #moves the function argument into %rcx
start_loop:                #the process loop begins
cmp $1, %rcx               #this is the exit condition for the loop
je loop_exit               #if the value in %rcx reaches 1, exit loop
imul %rcx, %rax            #multiply the current integer of the 
                           #factorial by the value stored in %rax
dec %rcx                   #reduce the factorial integer by 1
jmp start_loop             #unconditional jump to the start of loop
loop_exit:                 #the loop exit begins
mov %rbp, %rsp             #restore the stack-pointer
pop %rbp                   #remove the saved base-pointer from stack
ret                        #return

解决方案

TL:DR: the factorial of the return address overflowed %rax, leaving 0, because you ported wrong.


Porting 32-bit code to 64-bit is not as simple as changing all the register names. That might get it to assemble, but as you found even this simple program behaves differently. In x86-64, push %reg and call both push 64-bit values, and modify rsp by 8. You would see this if you single-stepped your code with a debugger. (See the bottom of the x86 tag wiki for info using gdb for asm.)

You're following a book that uses 32-bit examples, so you should probably just build them as 32-bit executables instead of trying to port them to 64-bit before you know how.


Your sys_exit() using the 32-bit int 0x80 ABI still works (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?), but you will run into trouble with system calls if you try to pass 64-bit pointers. Use the 64-bit ABI.

You will also run into problems if you want to call any library functions, because the standard function-calling convention is different, too. See Why parameters stored in registers and not on the stack in x86-64 Assembly?, and the 64-bit ABI link, and other calling-convention docs in the tag wiki.


But you're not doing any of that, so the problem with your program simply comes down to not accounting for the doubled "stack width" in x86-64. Your factorial function reads the return address as its argument.

Here's your code, commented to explain what it actually does

push $4                    # rsp-=8.  (rsp) = qword 4
                           # non-standard calling convention with args on the stack.
call factorial             # rsp-=8.  (rsp) = return address.  RIP=factorial
add $4, %rsp               # misalign the stack, so it's pointing to the top half of the 4 you pushed earlier.
# if this was in a function that wanted to return, you'd be screwed.

mov %rax, %rbx             # copy return value to first arg of system call
mov $1, %rax               #eax = __NR_EXIT from asm/unistd_32.h, wasting 2 bytes vs. mov $1, %eax
int $0x80                  # 32-bit ABI system call, eax=call number, ebx=first arg.  sys_exit(factorial(4))

So the caller is sort of fine (for the non-standard 64-bit calling convention you've invented that passes all args on the stack). You might as well omit the add to %rsp entirely, since you're about to exit without touching the stack any further.

.type factorial, @function #defines the code below as being a function

factorial:                 #function label
push %rbp                  #rsp-=8, (rsp) = rbp
mov %rsp, %rbp             # make a traditional stack frame

mov $1, %rax               #retval = 1.  (Wasting 2 bytes vs. the exactly equivalent mov $1, %eax)

mov 8(%rbp), %rcx          #load the return address into %rcx

... and calculate the factorial

For static executables (and dynamically linked executables that aren't ASLR enabled with PIE), _start is normally at 0x4000c0. Your program will still run nearly instantaneously on a modern CPU, because 0x4000c0 * 3c latency of imul is still only 12.5 million core clock cycles. On a 4GHz CPU, that's 3 milliseconds of CPU time.

If you'd made a position-independent executable by linking with gcc foo.o on a recent distro, _start would have an address like 0x5555555545a0, and your function would have taken ~70368 seconds to run on a 4GHz CPU with 3-cycle imul latency.

4194496! includes many even numbers, so its binary representation has many trailing zeros. The whole %rax will be zero by the time you're done multiplying by every number from 0x4000c0 down to 1.

The exit status of a Linux process is only the low 8 bits of the integer you pass to sys_exit() (because the wstatus is only a 32-bit int and includes other stuff, like what signal ended the process. See wait4(2)). So even with small args, it doesn't take much.

这篇关于仅通过将所有寄存器名称从 eXX 更改为 rXX 来从 32 位移植到 64 位会使阶乘返回 0?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆