如何在x86汇编中获取长字符串的长度以在断言时进行打印 [英] How to get length of long strings in x86 assembly to print on assertion

查看:102
本文介绍了如何在x86汇编中获取长字符串的长度以在断言时进行打印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个将文件读入内存的x86程序.它使用了一些不同的syscall,并弄乱了内存等.那里有很多需要弄清楚的地方.

I am trying to build an x86 program that reads a file into memory. It uses a few different syscalls, and messes with memory and such. There's a lot in there to figure out.

为简化调试和排除故障,我想添加assert语句,如果存在不匹配,则会打印出一条不错的错误消息.这是学习汇编的第一步,因此我可以打印放置在不同寄存器上的数字和字符串,以及进行类似操作后的字符串.然后,我可以打印出来并调试它们,而无需任何精美的工具.

To simplify debugging and figuring this out, I wanted to add assert statements which, if there's a mismatch, it prints out a nice error message. This is the first step in learning assembly so I can print the numbers and strings that get placed on different registers and such after operations. Then I can print them out and debug them without any fancy tools.

想知道是否可以帮助我在Macx86-64的NASM中编写ASSERTPRINT.到目前为止,我已经知道了:

Wondering if one could help me write an ASSERT AND PRINT in NASM for Mac x86-64. I have this so far:

%define a rdi
%define b rsi
%define c rdx
%define d r10
%define e r8
%define f r9
%define i rax

%define EXIT 0x2000001
%define EXIT_STATUS 0

%define READ 0x2000003 ; read
%define WRITE 0x2000004 ; write
%define OPEN 0x2000005 ; open(path, oflag)
%define CLOSE 0x2000006 ; CLOSE
%define MMAP 0x2000197 ; mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t offset)

%define PROT_NONE 0x00 ; no permissions
%define PROT_READ 0x01 ; pages can be read
%define PROT_WRITE 0x02 ; pages can be written
%define PROT_EXEC 0x04 ; pages can be executed

%define MAP_SHARED 0x0001 ; share changes
%define MAP_PRIVATE 0x0002 ; changes are private
%define MAP_FIXED 0x0010 ; map addr must be exactly as requested
%define MAP_RENAME 0x0020 ; Sun: rename private pages to file
%define MAP_NORESERVE 0x0040 ; Sun: don't reserve needed swap area
%define MAP_INHERIT 0x0080 ; region is retained after exec
%define MAP_NOEXTEND 0x0100 ; for MAP_FILE, don't change file size
%define MAP_HASSEMAPHORE 0x0200 ; region may contain semaphores

;
; Assert equals.
;

%macro ASSERT 3
  cmp %1, %2
  jne prepare_error
prepare_error:
  push %3
  jmp throw_error
%endmacro

;
; Print to stdout.
;

%macro PRINT 1
  mov c, getLengthOf(%1) ; "rdx" stores the string length
  mov b, %1 ; "rsi" stores the byte string to be used
  mov a, 1 ; "rdi" tells where to write (stdout file descriptor: 1)
  mov i, WRITE ; syscall: write
  syscall
%endmacro

;
; Read file into memory.
;

start:
  ASSERT PROT_READ, 0x01, "Something wrong with PROT_READ"

  mov b, PROT_READ
  mov a, PROT_WRITE
  xor a, b

  mov f, 0
  mov e, -1
  mov d, MAP_PRIVATE
  mov c, a
  mov b, 500000
  mov a, 0
  mov i, MMAP
  syscall
  PRINT "mmap output "
  PRINT i ; check what's returned
  PRINT "\n"
  mov e, i

  mov b, O_RDONLY
  mov a, "Makefile"
  mov i, OPEN
  syscall
  mov a, i

  mov b, e
  mov i, READ
  syscall

;
; Exit status
;

exit:
  mov a, EXIT_STATUS ; exit status
  mov i, EXIT ; syscall: exit
  syscall

throw_error:
  PRINT pop() ; print error or something
  jmp exit

推荐答案

mov rsi, "abcdefgh"是字符串 contents 的mov-immediate,而不是指向它的指针.如果您这样做,它只会作为即时存在.

mov rsi, "abcdefgh" is a mov-immediate of the string contents, not a pointer to it. It only exists as an immediate if you do that.

您的宏将需要切换到.rodata并返回以将字符串放入内存;可能您可以使用NASM宏将其转换为立即推入堆栈的顺序,但这听起来很难.

Your macro will need to switch to .rodata and back to put the string in memory; possibly you could turn it into a sequence of push-immediate onto the stack with NASM macros, but that sounds hard.

因此,您可以使用常规的msglen equ $ - msg来获取长度. (实际上使用NASM本地标签,因此宏不会造成冲突).

So you can use the usual msglen equ $ - msg to get the length. (Actually using NASM local labels so the macro doesn't create conflicts).

请参见 NASM-宏本地标签作为另一个宏的参数几周前,我基本上在这里写下了这个答案.但不是完全重复,因为它没有使用字符串作为立即数的错误.

See NASM - Macro local label as parameter to another macro where I wrote basically this answer a couple weeks ago. But not exactly a duplicate because it didn't have the bug of using the string as an immediate.

无论如何,NASM不支持AFAIK来切换部分,然后返回到当前部分,例如GAS .pushsection.因此,除非您要为节名称添加可选参数,否则我们将对section .text进行硬编码.

Anyway, NASM has no support AFAIK for switching sections and then coming back to the current section, like GAS .pushsection. So we're stuck hard-coding section .text unless you want to add an optional parameter for section name.

    ; write(1, string, sizeof(stringarray))
    ; switches to  SECTION .text regardless of previous section
    ; clobbers: RDI, RSI, RDX,   RCX,R11 (by syscall itself)
    : output: RAX = bytes written, or -errno
%macro PRINT 1
section .rodata 
;; NASM macro-local labels
    %%str    db  %1          ; put the string in read-only memory
    %%strln  equ $ - %%str   ; current position - string start
section .text
  mov     edx, %%strlen           ; len
  lea     rsi, [rel %%str]        ; buf = the string.  (RIP-relative for position-independent)
  mov     edi, 1                  ; fd = stdout
  mov     eax, WRITE
  syscall
%endmacro

这不会尝试合并同一字符串的重复项.多次使用同一条消息会降低效率.调试无关紧要.

This doesn't attempt to combine duplicates of the same string. Using it many times with the same message will be inefficient. This doesn't matter for debugging.

我本可以将您的%defines用于RDI,然后让NASM将mov rdi, 1(7个字节)优化为mov edi, 1(5个字节).但是YASM不会这样做,因此,如果您关心使用YASM构建代码的人,最好将其明确.

I could have left your %defines for RDI, and let NASM optimize mov rdi, 1 (7 bytes) into mov edi, 1 (5 bytes). But YASM won't do that so it's better to make it explicit if you care about anyone building your code with YASM.

我使用了相对于RIP的LEA,因为这是将静态地址放入与位置无关的代码中的寄存器的最有效方法.在Linux非PIE可执行文件中,使用mov esi, %%str(5个字节,可以在任何端口上运行,超过LEA).但是在OS X上,映射/加载可执行文件的基本虚拟地址始终大于2 ^ 32,并且您永远不希望mov r64, imm64具有64位绝对地址.

I used a RIP-relative LEA because that's the most efficient way to put a static address into a register in position-independent code. In Linux non-PIE executables, use mov esi, %%str (5 bytes and can run on any port, more than LEA). But on OS X, the base virtual address where an executable is mapped/loaded is always above 2^32, and you never want mov r64, imm64 with a 64-bit absolute address.

在Linux上,系统调用号是小整数,您可以使用lea eax, [rdi-1 + WRITE]使用3字节指令执行eax = SYS_write,而mov使用5指令.

On Linux, where system-call numbers are small integers, you could use lea eax, [rdi-1 + WRITE] to do eax = SYS_write with a 3 byte instruction vs. 5 for mov.

这篇关于如何在x86汇编中获取长字符串的长度以在断言时进行打印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆