GDB中的x86标签和LEA [英] x86 labels and LEA in GDB
问题描述
我正在学习在x86汇编语言(此刻为32位)中编写代码,而我正在努力完全理解内存模型.尤其令人困惑的是标签的语义,LEA指令以及布局可执行文件.我编写了此示例程序,以便可以检查它是否在gdb中运行.
I"m learning to code in x86 assembly (32-bit at the moment) and I'm struggling to understand the memory model completely. Particularly confusing is the semantics for labels, and the LEA instruction, and the layout of the executable. I wrote this sample program so i could inspect it running in gdb.
; mem.s
SECTION .data
msg: db "labeled string\n"
db "unlabeled-string\n"
nls: db 10,10,10,10,10,10,10,10
SECTION .text
global _start
_start:
; inspect msg label, LEA instruction
mov eax, msg
mov eax, &msg
mov eax, [msg]
; lea eax, msg (invalid instruction)
lea eax, &msg
lea eax, [msg]
; populate array in BSS section
mov [arr], DWORD 1
mov [arr+4], DWORD 2
mov [arr+8], DWORD 3
mov [arr+12], DWORD 4
; trying to print the unlabeled string
mov eax, 4
mov ebx, 1
lea ecx, [msg+15]
int 80H
mov eax, 1 ; exit syscall
mov ebx, 0 ; return value
int 80H
SECTION .bss
arr: resw 16
我已经组装并链接到:
nasm -f elf -g -F stabs mem.s
ld -m elf_i386 -o mem mem.o
GDB会话:
(gdb) disas *_start
Dump of assembler code for function _start:
0x08048080 <+0>: mov $0x80490e4,%eax
0x08048085 <+5>: mov 0x80490e4,%eax
0x0804808a <+10>: mov 0x80490e4,%eax
0x0804808f <+15>: lea 0x80490e4,%eax
0x08048095 <+21>: lea 0x80490e4,%eax
0x0804809b <+27>: movl $0x1,0x8049110
0x080480a5 <+37>: movl $0x2,0x8049114
0x080480af <+47>: movl $0x3,0x8049118
0x080480b9 <+57>: movl $0x4,0x804911c
0x080480c3 <+67>: mov $0x4,%eax
0x080480c8 <+72>: mov $0x1,%ebx
0x080480cd <+77>: lea 0x80490f3,%ecx
0x080480d3 <+83>: int $0x80
0x080480d5 <+85>: mov $0x1,%eax
0x080480da <+90>: mov $0x0,%ebx
0x080480df <+95>: int $0x80
检查味精"值:
(gdb) b _start
Breakpoint 1 at 0x8048080
(gdb) run
Starting program: /home/jh/workspace/x86/mem_addr/mem
(gdb) p msg
# what does this value represent?
$1 = 1700946284
(gdb) p &msg
$2 = (<data variable, no debug info> *) 0x80490e4
# this is the address where "labeled string" is stored
(gdb) p *0x80490e4
# same value as above (eg: p msg)
$3 = 1700946284
(gdb) p *msg
Cannot access memory at address 0x6562616c
# NOTE: 0x6562616c is ASCII values of 'e','b','a','l'
# the first 4 bytes from msg: db "labeled string"... little-endian
(gdb) x msg
0x6562616c: Cannot access memory at address 0x6562616c
(gdb) x &msg
0x80490e4 <msg>: 0x6562616c
(gdb) x *msg
Cannot access memory at address 0x6562616c
一次完成一条指令:
(gdb) p $eax
$4 = 0
(gdb) stepi
0x08048085 in _start ()
(gdb) p $eax
$5 = 134516964
0x0804808a in _start ()
(gdb) p $eax
$6 = 1700946284
(gdb) stepi
0x0804808f in _start ()
(gdb) p $eax
$7 = 1700946284
(gdb) stepi
0x08048095 in _start ()
(gdb) p $eax
$8 = 134516964
按预期的那样,用值1、2、3、4填充数组:
The array was populated with the values 1,2,3,4 as expected:
# before program execution:
(gdb) x/16w &arr
0x8049104 <arr>: 0 0 0 0
0x8049114: 0 0 0 0
0x8049124: 0 0 0 0
0x8049134: 0 0 0 0
# after program execution
(gdb) x/16w &arr
0x8049104 <arr>: 1 2 3 4
0x8049114: 0 0 0 0
0x8049124: 0 0 0 0
0x8049134: 0 0 0 0
我不明白为什么在gdb中打印标签会导致这两个值.另外,如何打印未标记的字符串. 预先感谢
I don't understand why printing a label in gdb results in those two values. Also, how can I print the unlabeled string. Thanks in advance
推荐答案
它有点令人困惑,因为gdb并不了解标签的概念,实际上-它旨在调试用高级语言(C或C ++)编写的程序(通常)并由编译器进行编译.因此,它会根据对正在发生的事情的最佳猜测(在没有来自编译器的调试信息来告诉它是什么的情况下),尝试将二进制文件中看到的内容映射到高级语言概念(变量和类型).继续).
Its somewhat confusing because gdb doesn't understand the concept of labels, really -- its designed to debug a program written in higher-level language (C or C++, generally) and compiled by a compiler. So it tries to map what it sees in the binary to high-level language concepts -- variables and types -- based on its best guess as to what is going on (in the absence of debug info from the compiler that tells it what is going on).
nasm做什么
对于汇编程序,标签是尚未设置的值-当链接程序运行时,它实际上会获得其最终值.通常,标签用于引用内存部分中的地址-当链接器布置最终的可执行映像时,将定义实际地址.汇编器会生成重定位记录,以便链接器可以正确设置标签的用途.
To the assembler, a label is value that hasn't been set yet -- it actually gets its final value when the linker runs. Generally, labels are used to refer to addresses in sections of memory -- the actual address will get defined when the linker lays out the final executable image. The assembler generates relocation records so that uses of the label can be set properly by the linker.
所以当汇编器看到
mov eax, msg
它知道msg
是与数据段中的地址相对应的标签,因此它生成一条指令以将该地址加载到eax中.看到的时候
it knows that msg
is a label corresponding to an address in the data segment, so it generates an instruction to load that address into eax. When it sees
mov eax, [msg]
它生成一条指令,以从地址msg
的内存中加载32位(寄存器eax的大小).在这两种情况下,都将生成一个重定位,以便链接器可以插入最终地址msg
所在的地址.
it generates an instruction to load 32-bits (the size of register eax) from memory at address of msg
. In both cases, there will be a relocation generated so that the linker can plug in the final address msg
ends up with.
(除了-我不知道&
对nasm意味着什么-它没有出现在我可以看到的文档中的任何地方,所以我很惊讶它没有给出错误.但是看起来会将其视为[]
)
(aside -- I have no idea what &
means to nasm -- it doesn't appear anywhere in the documentation I can see, so I'm suprised it doesn't give an error. But it looks like it treats it as an alias for []
)
现在,LEA是一条有趣的指令-它的基本格式与从内存中移出的格式相同,但它不读取内存,而是将要读取的地址存储到目标寄存器中.所以
Now LEA is a funny instruction -- it has basically the same format as a move from memory, but instead of reading memory, it stores the address it would have read from into the destination register. So
lea eax, msg
毫无意义-源是标签(地址)msg
,它是一个(链接时间)常量,不在内存中任何地方.
makes no sense -- the source is the label (address) msg
, which is a (link time) constant and is not in memory anywhere.
lea eax, [msg]
起作用,因为源位于内存中,因此它将源的地址粘贴到eax中.这与mov eax, msg
相同.最常见的是,您只会看到lea
与更复杂的寻址模式一起使用,因此您可以利用x86 AGU进行有用的工作,而不仅仅是计算地址.例如:
works, as the source is in memory, so it sticks the address of the source into eax. This is the same effect as mov eax, msg
. Most commonly, you only see lea
used with more complex addressing modes, so that you can leverage the x86 AGU to do useful work other than just computing addresses. Eg:
lea eax, [ebx+4*ecx+32]
执行一次移位,然后在AGU中加两个,然后将结果放入eax,而不是从该地址加载.
which does a shift and two adds in the AGU and puts the result into eax rather than loading from that address.
gdb的功能
在gdb中,当您键入p <expression>
时,它将尽力了解<expression>
的含义,以更好地理解C/C ++编译器对该表达式的含义.所以当你说
In gdb, when you type p <expression>
it tries to evaluate <expression>
to the best of its understanding of what the C/C++ compiler means for that expression. So when you say
(gdb) p msg
它查看msg
并说它看起来像一个变量,所以让我们获取该变量的当前值并打印出来".现在知道编译器喜欢将全局变量放入.data段中,并且他们为这些变量创建符号的名称与变量相同.由于它将符号表中的msg
视为.data
段中的符号,因此它假定发生了这种情况,并获取该符号处的内存并进行打印.现在,它不知道该变量是什么TYPE(没有调试信息),因此它猜测它是32位int并以此方式打印.
it looks at msg
and says "that looks like a variable, so lets get the current value of that variable and print that". Now it knows that compilers like to put global variables into the .data segment, and that they create symbols for those variables with the same name as the varible. Since it sees msg
in the symbol table as a symbol in the .data
segment, it assumes that is what is going on, and fetches the memory at that symbol and prints it. Now it has no idea what TYPE that variable is (no debug info), so it guesses that it is a 32-bit int and prints it as that.
所以输出
$1 = 1700946284
是msg的前4个字节,视为整数.
is the first 4 bytes of msg, treated as an integer.
对于p &msg
,它知道您要获取变量msg
的地址,因此它直接从符号中给出地址.在打印地址时,gdb会打印出有关这些地址的类型信息,因此会随之显示数据变量,无调试信息".
For p &msg
it understands you want to take the address of the variable msg
, so it give the address from the symbol directly. When printing addresses, gdb prints the type information it has about those addresses, thus the "data variable, no debug info" that comes out with it.
如果需要,可以使用强制类型转换来指定gdb的类型,并且它将使用该类型而不是它的猜测:
If you want, you can use a cast to specify the type of something to gdb, and it will use that type instead of what it has guessed:
(gdb) p (char)msg
$6 = 108 'l'
(gdb) p (char [10])msg
$7 = "labeled st"
(gdb) p (char *)&msg
$8 = 0x80490e4 "labeled string\\nunlabeled-string\\n\n\n\n\n\n\n\n\n" <Address 0x804910e out of bounds>
请注意,在后一种情况下,字符串上没有NUL终止符,因此它会打印出整个数据段...
Note in the latter case here, there's no NUL terminator on the string, so it prints out the entire data segment...
要使用sys_write打印未标记的字符串,您需要找出地址 和字符串的长度,这几乎是您所拥有的.为了完整性,您还应该检查返回值:
To print the unlabelled string with sys_write, you need to figure out the address and length of string, which you almost have. For completeness you should also check the return value:
mov ebx, 1 ; fd 1 (stdout)
lea ecx, [msg+15] ; address
mov edx, 17 ; length
write_more:
mov eax, 4 ; sys_write
int 80H ; write(1, &msg[15], 17)
test eax, eax ; check for error
js error ; error, eax = -ERRNO
add ecx, eax
sub edx, eax
jg write_more ; only part of the string was written
这篇关于GDB中的x86标签和LEA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!