转储全球DATAS到磁盘组装code [英] Dump global datas to disk in assembly code

查看:168
本文介绍了转储全球DATAS到磁盘组装code的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该实验是在Linux,x86 32位。

所以在我的汇编程序想,我需要定期(例如每执行基本100000块后时间)转储.bss段数组从内存到磁盘。数组的起始地址和大小是固定的。该阵列记录执行基本块的地址,大小为16M现在。

我试图从.bss节堆栈写一些本土code,以的memcpy ,然后将它写回磁盘。但在我看来,这是很繁琐的,我担心的是性能和内存消耗,说,每次时间在栈上分配一个非常大的内存...

因此​​,这里是我的问题,我怎么可以转储全局数据段内存在一种有效的方式?我是不够清楚?


解决方案

首先,不写在ASM 您的code的这一部分,ESP。没有在第一。编写一个C函数来处理这部分,并从ASM调用它。如果您需要为PErF调整,只有当它的时间来抛售另一个16MiB运行的一部分,您可以手动调整它,然后。系统级编程是所有关于检查系统调用(或C stdio函数)错误再次出现,而且这样做的ASM将是痛苦的。

显然,你的可以的写ASM任何东西,因为制作系统调用是没有什么特别的对比C.而且也没有任何的这一部分这是在ASM更易于相比C,除了可能扔在一个 MFENCE 周围的锁定。

反正我你想用你的缓冲区发生什么解决的三个变化:


  1. 覆盖相同的缓冲到位(的mmap(2) / 则msync(2)

  2. 附加缓冲的快照文件(带有写(2)或可能 - 不工作的零拷贝 vmsplice(2 ) + 拼接(2)的想法。)

  3. 写旧后开始一个新的(零)的缓冲区。 的mmap(2)输出文件的顺序块。

就地重写

如果你只是想,作为您的数组每一次,的mmap(2) A文件覆盖磁盘的同一区域,使用。 (呼叫则msync(2)定期强制数据到磁盘)的mmapped方法将不能保证一致状态的文件,但。写操作可以刷新到比其他要求的磁盘。 IDK如果有,以避免与任何种类的保证的方式(即不只是选择缓冲区冲洗计时器等等让你的网页,通常不要被书面除则msync(2)

追加快照

最简单的方法来缓冲追加到一个文件是为只需拨打写(2)当你想它写写(2)做了你所需要的一切。如果你的程序是多线程的,则可能需要采取的数据锁的系统调用之前和之后解除锁定。我不知道写系统调用将如何回报快。内核已经拷贝数据到页面缓存后,可能只能返回。

如果你只需要一个快照,但是写入到缓冲区中的原子事务(即缓冲器总是一致的状态,而不是对需要与对方一致的值),那么你不需要调用写(2)之前采取锁定。会有偏差的在这种情况下,一个微小的量(在缓冲器的末端数据将是从稍微晚于从开始数据时,为了假设内核副本)。

IDK如果写(2)直接IO返回慢或更快的(零拷贝,绕过页面缓存)。 开放(2) O_DIRECT 您的文件写入(2)正常。

有有什么地方是在复制的过程中,如果你想要写缓冲区的快照,然后继续修改它。否则MMU写入时复制挂羊头卖狗肉:

零拷贝追加快照

有是做用户页面的零拷贝写入磁盘文件的API。 Linux的的 vmsplice(2) 拼接(2) 在为了让你告诉内核你的页面映射到页面缓存。如果没有 SPLICE_F_GIFT ,我认为它设置起来为写入时复制。 (哎呀,其实手册页说没有 SPLICE_F_GIFT ,以下拼接(2)将要复制。所以IDK如果有一种机制来获得写入时复制的语义。)

假设有一种方式来获得写入时复制语义您的网页,直到内核做其写入磁盘,并可以释放他们:

此外写可能需要内核memcpy的数据命中磁盘之前一两页,但保存复制整个缓冲区。软页面错误和页表操作的开销可能不值得也无妨,除非你的数据访问模式非常空间定位在较短的时间段,直到写命中盘和要被写入的页可以被释放。 (我认为这种方式工作不存在的API,因为没有机制以获取释放他们打盘右后页。Linux的希望接管他们,让他们在页面缓存)。

我没有用过vmsplice,所以我可能会得到一些细节问题。

如果有创建相同的内存的新副本上写映射的方法,也许通过 MMAP ING在tmpfs的一个临时文件的一个新的映射(文件系统,概率。的/ dev / shm的),这将让你的快照未持有长期锁定。然后,你可以通过快照写(2),和之前尽快取消映射它太多的写入时复制页面错误发生。

每块新的缓冲区

如果它的确定开始的归零缓冲后,每写,你可以的mmap(2)文件的连续块,所以你生成数据始终是已在正确的地方。


  • (可选) fallocate(2)在输出文件的一些空间,以prevent碎片如果写模式是不连续的。

  • 的mmap(2)您的缓冲区到输出文件的第一16MiB。

  • 运行正常

  • 当你想移动到下一个16MiB:

    1. 采取锁prevent其他线程使用缓冲

    2. 则munmap(2)您的缓冲区

    3. 的mmap(2)文件的对同一地址的下一个16MiB 的,所以你不必通过周围的新地址作家。这些网页将是pre归零,所要求的POSIX(不能有暴露内核内存)。

    4. 解除锁定


也许 MMAP(BUF,16MiB,... MAP_FIXED,FD,new_offset)可以取代则munmap / MMAP 对。 MAP_FIXED 丢弃旧,重叠 MMAP 英格斯。我认为这并不意味着修改文件/共享内存被丢弃,而是实际的映射改变,即使没有则munmap

The experiment is on Linux, x86 32-bit.

So suppose in my assembly program, I need to periodically (for instance every time after executing 100000 basic blocks) dump an array in .bss section from memory to the disk. The starting address and size of the array is fixed. The array records the executed basic block's address, the size is 16M right now.

I tried to write some native code, to memcpy from .bss section to the stack, and then write it back to disk. But it seems to me that it is very tedious and I am worried about the performance and memory consumption, say, every-time allocate a very large memory on the stack...

So here is my question, how can I dump the memory from global data sections in an efficient way? Am I clear enough?

解决方案

First of all, don't write this part of your code in asm, esp. not at first. Write a C function to handle this part, and call it from asm. If you need to perf-tune the part that only runs when it's time to dump another 16MiB, you can hand-tune it then. System-level programming is all about checking error returns from system calls (or C stdio functions), and doing that in asm would be painful.

Obviously you can write anything in asm, since making system calls isn't anything special compared to C. And there's no part of any of this that's easier in asm compared to C, except for maybe throwing in an MFENCE around the locking.

Anyway, I've addressed three variations on what exactly you want to happen with your buffer:

  1. Overwrite the same buffer in place (mmap(2) / msync(2))
  2. Append a snapshot of the buffer to a file (with either write(2) or probably-not-working zero-copy vmsplice(2) + splice(2) idea.)
  3. Start a new (zeroed) buffer after writing the old one. mmap(2) sequential chunks of your output file.

In-place overwrites

If you just want to overwrite the same area of disk every time, mmap(2) a file and use that as your array. (Call msync(2) periodically to force the data to disk.) The mmapped method won't guarantee a consistent state for the file, though. Writes can get flushed to disk other than on request. IDK if there's a way to avoid that with any kind of guarantee (i.e. not just choosing buffer-flush timers and so on so your pages usually don't get written except by msync(2).)

Append snapshots

The simple way to append a buffer to a file would be to simply call write(2) when you want it written. write(2) does everything you need. If your program is multi-threaded, you might need to take a lock on the data before the system call, and release the lock afterwards. I'm not sure how fast the write system call would return. It may only return after the kernel has copied your data to the page cache.

If you just need a snapshot, but all writes into the buffer are atomic transactions (i.e. the buffer is always in a consistent state, rather than pairs of values that need to be consistent with each other), then you don't need to take a lock before calling write(2). There will be a tiny amount of bias in this case (data at the end of the buffer will be from a slightly later time than data from the start, assuming the kernel copies in order).

IDK if write(2) returns slower or faster with direct IO (zero-copy, bypassing the page-cache). open(2) your file with with O_DIRECT, write(2) normally.

There has to be a copy somewhere in the process, if you want to write a snapshot of the buffer and then keep modifying it. Or else MMU copy-on-write trickery:

Zero-copy append snapshots

There is an API for doing zero-copy writes of user pages to disk files. Linux's vmsplice(2) and splice(2) in that order will let you tell the kernel to map your pages into the page cache. Without SPLICE_F_GIFT, I assume it sets them up as copy-on-write. (oops, actually the man page says without SPLICE_F_GIFT, the following splice(2) will have to copy. So IDK if there is a mechanism to get copy-on-write semantics.)

Assuming there was a way to get copy-on-write semantics for your pages, until the kernel was done writing them to disk and could release them:

Further writes might need the kernel to memcpy one or two pages before the data hit disk, but save copying the whole buffer. The soft page faults and page-table manipulation overhead might not be worth it anyway, unless your data access pattern is very spatially-localized over the short periods of time until the write hits disk and the to-be-written pages can be released. (I think an API that works this way doesn't exist, because there's no mechanism for getting the pages released right after they hit disk. Linux wants to take them over and keep them in the page cache.)

I haven't ever used vmsplice, so I might be getting some details wrong.

If there's a way to create a new copy-on-write mapping of the same memory, maybe by mmaping a new mapping of a scratch file (on a tmpfs filesystem, prob. /dev/shm), that would get you snapshots without holding the lock for long. Then you can just pass the snapshot to write(2), and unmap it ASAP before too many copy-on-write page faults happen.

New buffer for every chunk

If it's ok to start with a zeroed buffer after every write, you could mmap(2) successive chunk of the file, so the data you generate is always already in the right place.

  • (optional) fallocate(2) some space in your output file, to prevent fragmentation if your write pattern isn't sequential.
  • mmap(2) your buffer to the first 16MiB of your output file.
  • run normally
  • When you want to move on to the next 16MiB:

    1. take a lock to prevent other threads from using the buffer
    2. munmap(2) your buffer
    3. mmap(2) the next 16MiB of the file to the same address, so you don't need to pass the new address around to writers. These pages will be pre-zeroed, as required by POSIX (can't have the kernel exposing memory).
    4. release the lock

Possibly mmap(buf, 16MiB, ... MAP_FIXED, fd, new_offset) could replace the munmap / mmap pair. MAP_FIXED discards old mmapings that it overlaps. I assume this doesn't mean that modifications to the file / shared memory are discarded, but rather that the actual mapping changes, even without an munmap.

这篇关于转储全球DATAS到磁盘组装code的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆