通过解析ELF C ++程序将字符串文字的地址映射到字符串文字 [英] map the address of string literal to string literal, by parsing ELF C++ program

查看:158
本文介绍了通过解析ELF C ++程序将字符串文字的地址映射到字符串文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

字符串文字的地址在编译时确定。该地址和字符串文字可以在内置的可执行程序(ELF格式)中找到。例如,以下代码输出 String Literal:0x400674

Address of string literals are determined at compile time. This address and the string literal can be found in the built executable program (In ELF format). For example, the following code outputs String Literal: 0x400674

printf("String Literal: %p\n", "Hello World");   

objdump -s -j .rodata test1 显示

.rodata节的内容:

Contents of section .rodata:

400670 01000200 48 656c6c 6f20576f 726c6400 .... H 世界。

400670 01000200 48656c6c 6f20576f 726c6400 ....Hello World.

....

因此,看起来我可以通过读取可执行程序本身来获取 Hello World的虚拟地址。

So it looks like I can get the virtual address of "Hello World" by reading the executable program itself.

问题:如何通过阅读ELF在字符串文字的地址和字符串本身之间建立表/映射/字典格式?

Question: How can I build a table/map/dictionary between the address of string literal and the the string itself, by reading the ELF format?

我试图编写一个独立的python脚本或c ++程序以读取elf程序并生成表。只要表格中包含字符串文字的整个映射,就可以在表格中添加额外的映射(而不是字符串文字)。

I am trying to writeup a standalone python script or c++ program to read the elf program and generate the table. It's OK if extra mapping(not the string literal) in the table, as long as the table contains the whole mapping of string literals.

推荐答案

我不确定您的问题是否总是有意义。详细信息是特定于实现的(特定于操作系统,编译器和编译标志)。

I am not sure your question always make sense. Details are implementation specific (operating system and compiler and compilation flags specific).

首先,一个编译器会同时看到 abcd cd 个文字字符串共享其存储,并使用 abcd +2 作为第二个。参见此答案

First, a compiler which sees both "abcd" and "cd" literal strings in the same translation unit is permitted (but not required) to share their storage and use "abcd"+2 as the second one. See this answer.

然后,在 ELF 文件,字符串只是初始化的只读数据(通常在 .rodata < .text 部分。 > text segment ),它们可能与某些非字符串常量相同。 ELF文件不保留任何类型信息(使用<$ c编译时,除debug DWARF 信息外) $ c> -g )。换句话说,以下

Then, in ELF files, strings are simply initialized read-only data (often in the .rodata or .text section of the text segment), and they could happen to be the same as some non-string constants. ELF files do not keep any typing information (except as debug DWARF information when compiled with -g). In other words, the following

const uint8_t constable[] = { 0x65, 0x68, 0x6c, 0x6c, 0x6f, 0 };

具有与 hello 文字字符串,但不是源字符串。更糟糕的是,机器代码的某些部分可能看起来像字符串。

has exactly the same machine representation as "hello" literal string, but is not a source string. Even worse, some parts of the machine code could happen to look like strings.

BTW,您可以使用 strings(1)命令,或者研究其源代码并使其适应您的需求。

BTW, you could use the strings(1) command, or perhaps study its source code and adapt it for your needs.

另请参见 dladdr(3)这个问题

请记住两个不同的流程(根据定义!)具有不同的地址空间虚拟内存。另请参阅 ASLR 。字符串文字也可能出现在共享对象中(例如,像 libc.so 这样的共享库),它们通常在 mmap 中不同的地址段(因此同一文字字符串在不同的进程中将具有不同的地址!)。

Bear in mind that two different processes have (by definition!) different address spaces in virtual memory. Read also about ASLR. Also string literals may occur in shared objects (e.g. shared libraries like libc.so) which are often mmap-ed in different address segments (so the same literal string would have different addresses in different processes!).

您可能对 libelf readelf(1) bfd 来读取ELF文件。

You might be interested by libelf or readelf(1) or bfd to read the ELF file.

这篇关于通过解析ELF C ++程序将字符串文字的地址映射到字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆