使用AT& T语法将整数打印为字符串,并使用Linux系统调用而不是printf [英] Printing an integer as a string with AT&T syntax, with Linux system calls instead of printf

查看:82
本文介绍了使用AT& T语法将整数打印为字符串,并使用Linux系统调用而不是printf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了一个汇编程序来显示AT& amp;之后的数字的阶乘. t语法.但这不起作用.这是我的代码

I have written a Assembly program to display the factorial of a number following AT & t syntax.But it's not working.here is my code

.text 

.globl _start

_start:
movq $5,%rcx
movq $5,%rax


Repeat:                     #function to calculate factorial
   decq %rcx
   cmp $0,%rcx
   je print
   imul %rcx,%rax
   cmp $1,%rcx
   jne Repeat
# Now result of factorial stored in rax
print:
     xorq %rsi, %rsi

  # function to print integer result digit by digit by pushing in 
       #stack
  loop:
    movq $0, %rdx
    movq $10, %rbx
    divq %rbx
    addq $48, %rdx
    pushq %rdx
    incq %rsi
    cmpq $0, %rax
    jz   next
    jmp loop

  next:
    cmpq $0, %rsi
    jz   bye
    popq %rcx
    decq %rsi
    movq $4, %rax
    movq $1, %rbx
    movq $1, %rdx
    int  $0x80
    addq $4, %rsp
    jmp  next
bye:
movq $1,%rax
movq $0, %rbx
int  $0x80


.data
   num : .byte 5

这个程序什么也没打印,我也用gdb来可视化它正常工作直到循环功能,但是当它进入下一个随机值时,它开始在各个寄存器中输入.帮助我调试以便它可以打印阶乘.

This program is printing nothing, I also used gdb to visualize it work fine until loop function but when it comes in next some random value start entering in various register.Help me to debug so that it could print factorial.

推荐答案

@ ped7g指出,您做错了几件事:在64位代码中使用int 0x80 32位ABI,并传递字符值而不是指向write()系统调用的指针.

As @ped7g points out, you're doing several things wrong: using the int 0x80 32-bit ABI in 64-bit code, and passing character values instead of pointers to the write() system call.

这是在64位Linux上打印整数的简单方法,该方法简单而有效.请参见在Intel Skylake上为21到83个周期).乘法逆运算将使此函数实际上有效,而不仅仅是有点". (但是,当然还有优化的余地...)

Here's how to print an integer in 64-bit Linux, the simple and somewhat-efficient way. See Why does GCC use multiplication by a strange number in implementing integer division? for avoiding div r64 for division by 10, because that's very slow (21 to 83 cycles on Intel Skylake). A multiplicative inverse would make this function actually efficient, not just "somewhat". (But of course there'd still be room for optimizations...)

系统调用很昂贵(write(1, buf, 1)可能要成千上万个周期),并且在循环内在寄存器上执行syscall,因此它既不方便又笨拙且效率低下.我们应该按打印顺序将字符写到一个小的缓冲区中(最低地址处的最高有效数字),然后在该缓冲区上进行单个write()系统调用.

System calls are expensive (probably thousands of cycles for write(1, buf, 1)), and doing a syscall inside the loop steps on registers so it's inconvenient and clunky as well as inefficient. We should write the characters into a small buffer, in printing order (most-significant digit at the lowest address), and make a single write() system call on that.

但是接下来我们需要一个缓冲区. 64位整数的最大长度仅为20个十进制数字,因此我们只能使用一些堆栈空间.在x86-64 Linux中,我们可以使用RSP以下的堆栈空间(最大128B),而无需通过修改RSP来保留"它.这称为

But then we need a buffer. The maximum length of a 64-bit integer is only 20 decimal digits, so we can just use some stack space. In x86-64 Linux, we can use stack space below RSP (up to 128B) without "reserving" it by modifying RSP. This is called the red-zone.

使用GAS可以轻松使用.h文件中定义的常量,而不是对系统调用号进行硬编码.请注意函数末尾的mov $__NR_write, %eax. x86 -64 SystemV ABI将系统调用参数传递给类似函数调用约定的寄存器. (因此,它与32位int 0x80 ABI完全不同.)

Instead of hard-coding system-call numbers, using GAS makes it easy to use the constants defined in .h files. Note the mov $__NR_write, %eax near the end of the function. The x86-64 SystemV ABI passes system-call arguments in similar registers to the function-calling convention. (So it's totally different registers from the 32-bit int 0x80 ABI.)

#include <asm/unistd_64.h>    // This is a standard glibc header file
// It contains no C code, only only #define constants, so we can include it from asm without syntax errors.

.p2align 4
.globl print_integer            #void print_uint64(uint64_t value)
print_uint64:
    lea   -1(%rsp), %rsi        # We use the 128B red-zone as a buffer to hold the string
                                # a 64-bit integer is at most 20 digits long in base 10, so it fits.

    movb  $'\n', (%rsi)         # store the trailing newline byte.  (Right below the return address).
    # If you need a null-terminated string, leave an extra byte of room and store '\n\0'.  Or  push $'\n'

    mov    $10, %ecx            # same as  mov $10, %rcx  but 2 bytes shorter
    # note that newline (\n) has ASCII code 10, so we could actually have used  movb %cl to save code size.

    mov    %rdi, %rax           # function arg arrives in RDI; we need it in RAX for div
.Ltoascii_digit:                # do{
    xor    %edx, %edx
    div    %rcx                 #  rax = rdx:rax / 10.  rdx = remainder

                                # store digits in MSD-first printing order, working backwards from the end of the string
    add    $'0', %edx           # integer to ASCII.  %dl would work, too, since we know this is 0-9
    dec    %rsi
    mov    %dl, (%rsi)          # *--p = (value%10) + '0';

    test   %rax, %rax
    jnz  .Ltoascii_digit        # } while(value != 0)
    # If we used a loop-counter to print a fixed number of digits, we would get leading zeros
    # The do{}while() loop structure means the loop runs at least once, so we get "0\n" for input=0

    # Then print the whole string with one system call
    mov   $__NR_write, %eax     # SYS_write, from unistd_64.h
    mov   $1, %edi              # fd=1
    # %rsi = start of the buffer
    mov   %rsp, %rdx
    sub   %rsi, %rdx            # length = one_past_end - start
    syscall                     # sys_write(fd=1 /*rdi*/, buf /*rsi*/, length /*rdx*/); 64-bit ABI
    # rax = return value (or -errno)
    # rcx and r11 = garbage (destroyed by syscall/sysret)
    # all other registers = unmodified (saved/restored by the kernel)

    # we don't need to restore any registers, and we didn't modify RSP.
    ret

要测试此功能,我将其放在同一文件中以调用它并退出:

To test this function, I put this in the same file to call it and exit:

.p2align 4
.globl _start
_start:
    mov    $10120123425329922, %rdi
#    mov    $0, %edi    # Yes, it does work with input = 0
    call   print_uint64

    xor    %edi, %edi
    mov    $__NR_exit, %eax
    syscall                             # sys_exit(0)

我将其构建为静态二进制文件(没有libc):

I built this into a static binary (with no libc):

$ gcc -Wall -nostdlib print-integer.S && ./a.out 
10120123425329922
$ strace ./a.out  > /dev/null
execve("./a.out", ["./a.out"], 0x7fffcb097340 /* 51 vars */) = 0
write(1, "10120123425329922\n", 18)     = 18
exit(0)                                 = ?
+++ exited with 0 +++
$ file ./a.out 
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=69b865d1e535d5b174004ce08736e78fade37d84, not stripped


相关:Linux x86-32扩展精度循环,该循环从每个32位肢体"中打印9个十进制数字:请参见


Related: Linux x86-32 extended-precision loop that prints 9 decimal digits from each 32-bit "limb": see .toascii_digit: in my Extreme Fibonacci code-golf answer. It's optimized for code-size (even at the expense of speed), but well-commented.

它像您一样使用div,因为它比使用快速乘法逆运算要小.它对外部循环使用loop(在多个整数上以提高精度),对于

It uses div like you do, because that's smaller than using a fast multiplicative inverse). It uses loop for the outer loop (over multiple integer for extended precision), again for code-size at the cost of speed.

它使用32位int 0x80 ABI,并打印到保存旧"斐波那契值而不是当前值的缓冲区中.

It uses the 32-bit int 0x80 ABI, and prints into a buffer that was holding the "old" Fibonacci value, not the current.

获得高效asm的另一种方法是使用C编译器.对于仅数字循环,请查看此C源产生的gcc或clang(基本上是asm所做的事情). Godbolt编译器资源管理器使您可以轻松尝试使用不同的选项和不同的编译器版本.

Another way to get efficient asm is from a C compiler. For just the loop over digits, look at what gcc or clang produce for this C source (which is basically what the asm is doing). The Godbolt Compiler explorer makes it easy to try with different options and different compiler versions.

请参见

See gcc7.2 -O3 asm output which is nearly a drop-in replacement for the loop in print_uint64 (because I chose the args to go in the same registers):

void itoa_end(unsigned long val, char *p_end) {
  const unsigned base = 10;
  do {
    *--p_end = (val % base) + '0';
    val /= base;
  } while(val);

  // write(1, p_end, orig-current);
}

我通过注释掉syscall指令并在函数调用周围放置重复循环来测试了Skylake i7-6700k的性能.使用mul %rcx/shr $3, %rdx的版本比使用div %rcx的版本快大约5倍,用于将长数字串(10120123425329922)存储到缓冲区中. div版本的运行频率为每个时钟0.25条指令,而mul版本的运行频率为每个时钟2.65条指令(尽管需要更多的指令).

I tested performance on a Skylake i7-6700k by commenting out the syscall instruction and putting a repeat loop around the function call. The version with mul %rcx / shr $3, %rdx is about 5 times faster than the version with div %rcx for storing a long number-string (10120123425329922) into a buffer. The div version ran at 0.25 instructions per clock, while the mul version ran at 2.65 instructions per clock (although requiring many more instructions).

可能值得将其展开2,再除以100,然后将其余部分分成2个数字.万一更简单的版本在mul + shr延迟上出现瓶颈,这将提供更好的指令级并行性.使val变为零的乘法/移位运算链的长度将是原来的一半,而在每个较短的独立依赖项链中有更多工作要处理0-99的余数.

It might be worth unrolling by 2, and doing a divide by 100 and splitting up the remainder of that into 2 digits. That would give a lot better instruction-level parallelism, in case the simpler version bottlenecks on mul + shr latency. The chain of multiply/shift operations that brings val to zero would be half as long, with more work in each short independent dependency chain to handle a 0-99 remainder.

这篇关于使用AT&amp; T语法将整数打印为字符串,并使用Linux系统调用而不是printf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆