从进程内部查找映射的内存 [英] Finding mapped memory from inside a process

查看:117
本文介绍了从进程内部查找映射的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置:

  • Ubuntu 18x64
  • x86_64应用程序
  • 从内部执行任意代码 应用程序
  • Ubuntu 18x64
  • x86_64 application
  • Arbitrary code execution from inside the application

我正在尝试编写即使启用了ASLR也应该能够在内存中查找结构的代码.可悲的是,我找不到对这些区域的任何静态引用,因此我猜测我必须使用蛮力方式并扫描进程内存.我试图做的是扫描应用程序的整个地址空间,但是由于某些内存区域未分配,因此无法访问,因此在访问时会产生SIGSEGV.现在,我认为getpid()是个好主意,然后使用pid访问/proc/$PID/maps并尝试从那里解析数据.

I'm trying to write code which should be able to find structures in memory even with ASLR enabled. Sadly, I couldn't find any static references to those regions, so I'm guessing I have to use the bruteforce way and scan the process memory. What I tried to do was to scan the whole address space of the application, but that doesn't work as some memory areas are not allocated and therefore yield SIGSEGV when accessed. Now I'm thinking it would be a good idea to getpid(), then use the pid to access /proc/$PID/maps and try to parse the data from there.

但是我想知道,有没有更好的方法来识别分配的区域?甚至甚至不需要我访问libc(= getpid, open, close)或弄弄字符串的方式?

But I wonder, is there a better way to identify allocated regions? Maybe even a way that doesn't require me to access libc (=getpid, open, close) or fiddle around with strings?

推荐答案

我认为没有任何标准的POSIX API.

I don't think there's any standard POSIX API for this.

解析/proc/self/maps是您最好的选择. (可能有一个库可以帮助您解决这个问题,但IDK).

Parsing /proc/self/maps is your best bet. (There may be a library to help with this, but IDK).

不过,您标记了此ASLR.如果您只想知道text/data/bss段的位置,则可以在它们的开头/结尾放置标签,以便这些地址在C中可用. extern const char bss_end[];是一种使用链接程序脚本和某些手写的asm引用您在BSS末尾放置的标签的好方法.编译器生成的asm将使用相对于RIP的LEA指令来获取相对于当前指令地址的寄存器中的地址(CPU知道该地址,因为它正在执行映射到该地址的代码).

You tagged this ASLR, though. If you just want to know where the text / data / bss segments are, you can put labels at the start/end of them so those addresses are available in C. e.g. extern const char bss_end[]; would be a good way to reference a label you put at the end of the BSS using a linker script and maybe some hand-written asm. The compiler-generated asm will use a RIP-relative LEA instruction to get the address in a register relative to the current instruction address (which the CPU knows because it's executing the code mapped there).

或者只是一个链接描述文件,并在自定义部分中声明伪C变量.

Or maybe just a linker script and declaring dummy C variables in custom sections.

我不确定您是否可以为堆栈映射做到这一点.在大型环境和/或argv中,main()甚至_start条目的初始堆栈可能与堆栈映射中的最高地址不在同一页中.

I'm not sure if you can do that for the stack mapping. With a large environment and/or argv, the initial stack on entry to main() or even _start might not be in the same page as the highest address in the stack mapping.

要进行扫描,您需要捕获SIGSEGV或使用系统调用而不是用户空间加载或存储进行扫描.

To scan, you either need to catch SIGSEGV or scan with system calls instead of user-space loads or stores.

mmapmprotect无法查询旧设置,因此它们对于非破坏性物品不是很有用.带有提示但没有MAP_FIXEDmmap可以映射页面,然后您可以munmap对其进行映射.如果实际选择的地址是!=提示,则可以假定该地址已被使用.

mmap and mprotect can't query the old setting, so they're not very useful for non-destructive stuff. mmap with a hint but without MAP_FIXED could map a page, and then you could munmap it. If the actual chosen address != hint, then you could assume the address was in use.

也许更好的选择是使用madvise(MADV_NORMAL)进行扫描并检查EFAULT,但一次只扫描一页.

Maybe a better option would be to scan with madvise(MADV_NORMAL) and check for EFAULT, but only one page at a time.

您甚至可以通过 errno=0; posix_madvise(page, 4096, POSIX_MADV_NORMAL) .然后检查errno:ENOMEM:指定范围内的地址部分或完全不在呼叫者的地址空间之内.

You could even do this portably with errno=0; posix_madvise(page, 4096, POSIX_MADV_NORMAL). Then check errno: ENOMEM: Addresses in the specified range are partially or completely outside the caller's address space.

在具有 madvise(2) 的Linux上,您可以使用MADV_DOFORK或每页处于非默认设置的可能性更低.

On Linux with madvise(2) you could use MADV_DOFORK or something that's even less likely to be at a non-default setting for each page.

但是在Linux上,用于只读查询进程内存映射的一个更好的选择是 addraddr + length包含未映射的内存". (EFAULT是指向未映射内存的结果向量,而不是addr.)

But on Linux, an even better choice for read-only querying the process memory mapping is mincore(2): It also uses the error code ENOMEM for an invalid addresses in the queried range. "addr to addr + length contained unmapped memory". (EFAULT is for the result vector pointing to unmapped memory, not addr).

errno结果有用; vec结果向您显示RAM中的页面是否很热. (我不确定是否显示向您链接了哪些页面到硬件页表中,或者是否会计算驻留在页面高速缓存中的页面是否是内存映射文件但未链接的页面,所以访问将触发一个软页面页面错误).

Only the errno result is useful; the vec result shows you whether pages are hot in RAM or not. (I'm not sure if it shows you which pages are wired into the HW page tables, or if it would count a page that's resident in memory in the pagecache for a memory mapped file but not wired, so an access would trigger a soft page fault).

您可以通过调用长度更大的mincore来对大型映射的结尾进行二进制搜索.

You can binary-search for the end of a large mapping by calling mincore with larger lengths.

但是不幸的是,在未映射的页面之后,我找不到用于查找下一个映射的任何等效项,这将更加有用,因为大多数地址空间都将被映射.尤其是在具有64位地址的x86-64中!

But unfortunately I don't see any equivalent for finding the next mapping after an unmapped page, which would be much more useful because most of the address-space will be unmapped. Especially in x86-64 with 64-bit addresses!

对于稀疏文件,有lseek(SEEK_DATA).我想知道这是否适用于Linux的/proc/self/mem吗?可能不是.

For sparse files there's lseek(SEEK_DATA). I wonder if that works on Linux's /proc/self/mem? probably not.

因此,可能很大(例如256MB)的(tmp=mmap(page, blah blah)) == page调用将是一种扫描未映射区域以查找映射页面的好方法.不管您是munmap(tmp)还是mmap使用您的提示地址.

So maybe large (like 256MB) (tmp=mmap(page, blah blah)) == page calls would be a good way to scan through unmapped regions looking for mapped pages. Either way you simply munmap(tmp), whether mmap used your hint address or not.

解析/proc/self/maps几乎肯定会更有效率.

Parsing /proc/self/maps is almost certainly more efficient.

但是最有效的方法是将标签放在您希望它们用于静态地址的位置,并跟踪动态分配,以便您已经知道内存的位置.如果没有内存泄漏,则此方法有效. (glibc malloc可能具有用于遍历映射的API,但我不确定.)

But the most efficient thing would be putting labels where you want them for static addresses, and tracking dynamic allocations so you already know where your memory is. This works if you have no memory leaks. (glibc malloc might have an API to walk the mappings, but I'm not sure.)

请注意,如果您向 any 系统调用传递未映射的地址以指向应该指向某个参数的参数,则会产生errno=EFAULT.

Note that any system call will produce an errno=EFAULT if you pass it an unmapped address for a parameter that's supposed to point to something.

一个可能的候选人是 access(2) ,接受一个文件名并返回一个整数.它对其他任何状态(成功或失败)的影响为零,但是如果指向的内存是有效的路径字符串,则不利的是文件系统访问.而且它正在寻找一个隐式长度的C字符串,因此如果很快将指针传递给没有0字节的内存的指针,可能也会很慢.我想ENAMETOOLONG会启动,但是它肯定会读取您使用它的每个可访问页面,即使页面被调出也会出错.

One possible candidate is access(2), which takes a filename and returns an integer. It has zero effect on the state of anything else, success or fail, but the downside is filesystem access if the pointed-to memory is a valid path string. And it's looking for an implicit-length C string so could also be slow if passed a pointer to memory with no 0 byte anywhere soon. I guess ENAMETOOLONG would kick in, but it still definitely reads every accessible page you use it on, faulting it in even if it was paged out.

如果在/dev/null上打开文件描述符,则可以使用它进行write()系统调用. 或者甚至使用 writev(2) :在一个系统调用中将内核的指针向量传递给内核,如果其中任何一个指针不好,则会得到EFAULT. (每个长度为1个字节). 但是(除非/dev/null驱动程序足够早地跳过读取操作),这实际上会从有效页中读取数据,从而使它们进入错误状态,这与mincore() 不同.根据内部实现的方式,/dev/null驱动程序可能会尽早看到请求以使其返回true",而无需执行任何操作,以避免在检查EFAULT后实际触摸页面.检查会很有趣.

If you open a file descriptor on /dev/null, you could make write() system calls with that. Or even with writev(2) : writev(devnull_fd, io_vec, count) to pass the kernel a vector of pointers in one system call, and get an EFAULT if any of them are bad. (With lengths of 1 byte each). But (unless the /dev/null driver skips reads early enough) this does actually read from pages that are valid, faulting them in unlike mincore(). Depending how it's implemented internally, the /dev/null driver might see the request early enough for its "return true"-without-doing-anything implementation to avoid actually touching pages after checking for EFAULT. Would be interesting to check.

这篇关于从进程内部查找映射的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆