GDB中的x86标签和LEA [英] x86 labels and LEA in GDB

查看:120
本文介绍了GDB中的x86标签和LEA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习在x86汇编语言(此刻为32位)中编写代码,而我正在努力完全理解内存模型.尤其令人困惑的是标签的语义,LEA指令以及布局可执行文件.我编写了此示例程序,以便可以检查它是否在gdb中运行.

I"m learning to code in x86 assembly (32-bit at the moment) and I'm struggling to understand the memory model completely. Particularly confusing is the semantics for labels, and the LEA instruction, and the layout of the executable. I wrote this sample program so i could inspect it running in gdb.

; mem.s
SECTION .data
    msg: db "labeled string\n"
    db "unlabeled-string\n"
    nls: db 10,10,10,10,10,10,10,10
SECTION .text
global  _start
_start:
    ; inspect msg label, LEA instruction
    mov eax, msg
    mov eax, &msg
    mov eax, [msg]
    ; lea eax, msg (invalid instruction)
    lea eax, &msg
    lea eax, [msg]

    ; populate array in BSS section
    mov [arr], DWORD 1
    mov [arr+4], DWORD 2
    mov [arr+8], DWORD 3
    mov [arr+12], DWORD 4

    ; trying to print the unlabeled string
    mov eax, 4
    mov ebx, 1
    lea ecx, [msg+15]
    int 80H

    mov eax, 1      ; exit syscall
    mov ebx, 0      ; return value
    int 80H
SECTION .bss
    arr: resw 16

我已经组装并链接到:

nasm -f elf -g -F stabs mem.s
ld -m elf_i386 -o mem mem.o

GDB会话:

(gdb) disas *_start
Dump of assembler code for function _start:
   0x08048080 <+0>: mov    $0x80490e4,%eax
   0x08048085 <+5>: mov    0x80490e4,%eax
   0x0804808a <+10>:    mov    0x80490e4,%eax
   0x0804808f <+15>:    lea    0x80490e4,%eax
   0x08048095 <+21>:    lea    0x80490e4,%eax
   0x0804809b <+27>:    movl   $0x1,0x8049110
   0x080480a5 <+37>:    movl   $0x2,0x8049114
   0x080480af <+47>:    movl   $0x3,0x8049118
   0x080480b9 <+57>:    movl   $0x4,0x804911c
   0x080480c3 <+67>:    mov    $0x4,%eax
   0x080480c8 <+72>:    mov    $0x1,%ebx
   0x080480cd <+77>:    lea    0x80490f3,%ecx
   0x080480d3 <+83>:    int    $0x80
   0x080480d5 <+85>:    mov    $0x1,%eax
   0x080480da <+90>:    mov    $0x0,%ebx
   0x080480df <+95>:    int    $0x80

检查味精"值:

(gdb) b _start
Breakpoint 1 at 0x8048080
(gdb) run
Starting program: /home/jh/workspace/x86/mem_addr/mem
(gdb) p msg
# what does this value represent?
$1 = 1700946284
(gdb) p &msg
$2 = (<data variable, no debug info> *) 0x80490e4
# this is the address where "labeled string" is stored
(gdb) p *0x80490e4
# same value as above (eg: p msg)
$3 = 1700946284
(gdb) p *msg
Cannot access memory at address 0x6562616c
# NOTE: 0x6562616c is ASCII values of 'e','b','a','l'
# the first 4 bytes from msg: db "labeled string"... little-endian
(gdb) x msg
0x6562616c: Cannot access memory at address 0x6562616c
(gdb) x &msg
0x80490e4 <msg>:    0x6562616c
(gdb) x *msg
Cannot access memory at address 0x6562616c

一次完成一条指令:

(gdb) p $eax
$4 = 0
(gdb) stepi
0x08048085 in _start ()
(gdb) p $eax
$5 = 134516964
0x0804808a in _start ()
(gdb) p $eax
$6 = 1700946284
(gdb) stepi
0x0804808f in _start ()
(gdb) p $eax
$7 = 1700946284
(gdb) stepi
0x08048095 in _start ()
(gdb) p $eax
$8 = 134516964

按预期的那样,用值1、2、3、4填充数组:

The array was populated with the values 1,2,3,4 as expected:

# before program execution:
(gdb) x/16w &arr
0x8049104 <arr>:    0   0   0   0
0x8049114:  0   0   0   0
0x8049124:  0   0   0   0
0x8049134:  0   0   0   0
# after program execution
(gdb) x/16w &arr
0x8049104 <arr>:    1   2   3   4
0x8049114:  0   0   0   0
0x8049124:  0   0   0   0
0x8049134:  0   0   0   0

我不明白为什么在gdb中打印标签会导致这两个值.另外,如何打印未标记的字符串. 预先感谢

I don't understand why printing a label in gdb results in those two values. Also, how can I print the unlabeled string. Thanks in advance

推荐答案

它有点令人困惑,因为gdb并不了解标签的概念,实际上-它旨在调试用高级语言(C或C ++)编写的程序(通常)并由编译器进行编译.因此,它会根据对正在发生的事情的最佳猜测(在没有来自编译器的调试信息来告诉它是什么的情况下),尝试将二进制文件中看到的内容映射到高级语言概念(变量和类型).继续).

Its somewhat confusing because gdb doesn't understand the concept of labels, really -- its designed to debug a program written in higher-level language (C or C++, generally) and compiled by a compiler. So it tries to map what it sees in the binary to high-level language concepts -- variables and types -- based on its best guess as to what is going on (in the absence of debug info from the compiler that tells it what is going on).

nasm做什么

对于汇编程序,标签是尚未设置的值-当链接程序运行时,它实际上会获得其最终值.通常,标签用于引用内存部分中的地址-当链接器布置最终的可执行映像时,将定义实际地址.汇编器会生成重定位记录,以便链接器可以正确设置标签的用途.

To the assembler, a label is value that hasn't been set yet -- it actually gets its final value when the linker runs. Generally, labels are used to refer to addresses in sections of memory -- the actual address will get defined when the linker lays out the final executable image. The assembler generates relocation records so that uses of the label can be set properly by the linker.

所以当汇编器看到

mov eax, msg

它知道msg是与数据段中的地址相对应的标签,因此它生成一条指令以将该地址加载到eax中.看到的时候

it knows that msg is a label corresponding to an address in the data segment, so it generates an instruction to load that address into eax. When it sees

mov eax, [msg]

它生成一条指令,以从地址msg的内存中加载32位(寄存器eax的大小).在这两种情况下,都将生成一个重定位,以便链接器可以插入最终地址msg所在的地址.

it generates an instruction to load 32-bits (the size of register eax) from memory at address of msg. In both cases, there will be a relocation generated so that the linker can plug in the final address msg ends up with.

(除了-我不知道&对nasm意味着什么-它没有出现在我可以看到的文档中的任何地方,所以我很惊讶它没有给出错误.但是看起来会将其视为[])

(aside -- I have no idea what & means to nasm -- it doesn't appear anywhere in the documentation I can see, so I'm suprised it doesn't give an error. But it looks like it treats it as an alias for [])

现在,LEA是一条有趣的指令-它的基本格式与从内存中移出的格式相同,但它不读取内存,而是将要读取的地址存储到目标寄存器中.所以

Now LEA is a funny instruction -- it has basically the same format as a move from memory, but instead of reading memory, it stores the address it would have read from into the destination register. So

lea eax, msg

毫无意义-源是标签(地址)msg,它是一个(链接时间)常量,不在内存中任何地方.

makes no sense -- the source is the label (address) msg, which is a (link time) constant and is not in memory anywhere.

lea eax, [msg]

起作用,因为源位于内存中,因此它将源的地址粘贴到eax中.这与mov eax, msg相同.最常见的是,您只会看到lea与更复杂的寻址模式一起使用,因此您可以利用x86 AGU进行有用的工作,而不仅仅是计算地址.例如:

works, as the source is in memory, so it sticks the address of the source into eax. This is the same effect as mov eax, msg. Most commonly, you only see lea used with more complex addressing modes, so that you can leverage the x86 AGU to do useful work other than just computing addresses. Eg:

lea eax, [ebx+4*ecx+32]

执行一次移位,然后在AGU中加两个,然后将结果放入eax,而不是从该地址加载.

which does a shift and two adds in the AGU and puts the result into eax rather than loading from that address.

gdb的功能

在gdb中,当您键入p <expression>时,它将尽力了解<expression>的含义,以更好地理解C/C ++编译器对该表达式的含义.所以当你说

In gdb, when you type p <expression> it tries to evaluate <expression> to the best of its understanding of what the C/C++ compiler means for that expression. So when you say

(gdb) p msg

它查看msg并说它看起来像一个变量,所以让我们获取该变量的当前值并打印出来".现在知道编译器喜欢将全局变量放入.data段中,并且他们为这些变量创建符号的名称与变量相同.由于它将符号表中的msg视为.data段中的符号,因此它假定发生了这种情况,并获取该符号处的内存并进行打印.现在,它不知道该变量是什么TYPE(没有调试信息),因此它猜测它是32位int并以此方式打印.

it looks at msg and says "that looks like a variable, so lets get the current value of that variable and print that". Now it knows that compilers like to put global variables into the .data segment, and that they create symbols for those variables with the same name as the varible. Since it sees msg in the symbol table as a symbol in the .data segment, it assumes that is what is going on, and fetches the memory at that symbol and prints it. Now it has no idea what TYPE that variable is (no debug info), so it guesses that it is a 32-bit int and prints it as that.

所以输出

$1 = 1700946284

是msg的前4个字节,视为整数.

is the first 4 bytes of msg, treated as an integer.

对于p &msg,它知道您要获取变量msg的地址,因此它直接从符号中给出地址.在打印地址时,gdb会打印出有关这些地址的类型信息,因此会随之显示数据变量,无调试信息".

For p &msg it understands you want to take the address of the variable msg, so it give the address from the symbol directly. When printing addresses, gdb prints the type information it has about those addresses, thus the "data variable, no debug info" that comes out with it.

如果需要,可以使用强制类型转换来指定gdb的类型,并且它将使用该类型而不是它的猜测:

If you want, you can use a cast to specify the type of something to gdb, and it will use that type instead of what it has guessed:

(gdb) p (char)msg
$6 = 108 'l'
(gdb) p (char [10])msg
$7 = "labeled st"
(gdb) p (char *)&msg
$8 = 0x80490e4 "labeled string\\nunlabeled-string\\n\n\n\n\n\n\n\n\n" <Address 0x804910e out of bounds>

请注意,在后一种情况下,字符串上没有NUL终止符,因此它会打印出整个数据段...

Note in the latter case here, there's no NUL terminator on the string, so it prints out the entire data segment...

要使用sys_write打印未标记的字符串,您需要找出地址 和字符串的长度,这几乎是您所拥有的.为了完整性,您还应该检查返回值:

To print the unlabelled string with sys_write, you need to figure out the address and length of string, which you almost have. For completeness you should also check the return value:

    mov ebx, 1           ; fd 1 (stdout)
    lea ecx, [msg+15]    ; address
    mov edx, 17          ; length
write_more:
    mov eax, 4           ; sys_write
    int 80H              ; write(1, &msg[15], 17)
    test eax, eax        ; check for error
    js error             ; error, eax = -ERRNO
    add ecx, eax
    sub edx, eax
    jg write_more        ; only part of the string was written

这篇关于GDB中的x86标签和LEA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆