使用mmap读取文件的同时写入文件 [英] Concurrently writing to file while reading it out using mmap

查看:734
本文介绍了使用mmap读取文件的同时写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

情况是这样.

  1. 大量数据缓冲区(应超过合理的RAM) 消费)正在由程序生成.

  1. A large buffer of data (which shall exceed reasonable RAM consumption) is being generated by the program.

该程序同时提供一个websocket,这将允许一个web 客户端指定要查看的该数据缓冲区的一小部分.

The program concurrently serves a websocket which will allow a web client to specify a small subset of this buffer of data to view.

为了支持第一个目标,使用标准方法写入文件(我使用便携式C-stdio fopenfwrite,因为已证明它比各种纯C ++"方法要快. .数据将附加到文件; stdio将缓冲写入并定期刷新它们.)

To support the first goal, the file is written to using standard methods (I use portable C-stdio fopen and fwrite because it's been shown to be faster than various "pure C++" methods. Doesn't matter. Data gets appended to file; stdio will buffer the writes and periodically flush them.)

为了支持第二个目标(在BSD上,尤其是在iOS上),打开文件(sys/fcntl.h中的open-不是,就像stdio.h一样可移植)并映射内存(sys/mman.h中的mmap-同上).通过决定使用内存映射,我不得不放弃此代码的某些可移植性.看来,Boost是我可以避免重新发明车轮的东西.

To support the second goal (on BSD, in particular iOS), the file is opened (open from sys/fcntl.h -- not as portable as stdio.h) and memory-mapped (mmap from sys/mman.h -- ditto). By deciding to use memory mapping I have to give up some portability with this code. It seems like Boost is something I could look at to avoid wheel reinvention.

无论如何,我的问题是我应该怎么做,因为将至少有两个线程:主程序线程定期追加到文件中,以及响应的网络(或工作线程)线程Web请求并传递从映射到磁盘上文件的内存区域中读出的数据.

Anyway, my question is about how exactly I'm supposed to do this, because there will be at least two threads: The main program thread appending to the file periodically, and the network (or a worker) thread which responds to web requests and delivers data read out of the memory regions that are mapped to the file on disk.

假设文件开始大小为1024字节,mmap最初被称为映射1024字节.当主线程向该文件中再写入512个字节时,如何通知网络线程或知道有关文件当前实际大小的任何信息(以便它可以再次使用munmapmmap并使用一个较大的缓冲区来新尺寸)?此外,如果我天真地执行此操作,我会担心主线程报告已写入512字节的情况,因此另一个线程现在会映射文件的1536字节,但实际上并不是所有新的512字节都已写入磁盘但是(也许OS仍在编写它).现在会发生什么?会出现一些垃圾吗?我的程序会崩溃吗?

Supposing the file starts out 1024 bytes in size, mmap is called initially mapping 1024 bytes. As the main thread writes a further 512 bytes into the file, how can the network thread be notified or know anything about the current actual size of the file (so that it can munmap and mmap again with a larger buffer corresponding to the new size)? Furthermore, if I do this naively, I am wary of a situation where the main thread reports that 512 bytes are written, so the other thread now maps 1536 bytes of the file, but not all of the new 512 bytes actually got written to disk yet (OS is still working on writing it, maybe). What happens now? Could there be some garbage that shows up? Will my program crash?

如何确定何时正确刷新了数据?刷新数据后如何及时通知我,以便可以对其进行内存映射?

How can I determine when data has been properly flushed? How can I be notified in a timely fashion after the data has been flushed so that I can memory map it?

尤其是,调用fflush是确保现在通过w.r.t更新文件的唯一方法.流,然后可以保证(一旦fflush返回)内存映射可以访问新大小而不会发生访问冲突?那fsync呢?

In particular, is calling fflush the only way to guarantee that the file is now updated w.r.t. the stream, and then can I guarantee (once fflush returns) that the memory map can access the new size without an access violation? What about fsync?

推荐答案

当您直接以mmap形式使用POSIX API时,也应该直接将其用于写作. POSIX和LibC接口只是不能很好地配合使用.

When you are using POSIX API directly in the form of mmap, you should also be using it directly for the writing. POSIX and LibC interfaces just don't play well together.

write是一个系统调用,它将数据直接传输到内核.逐字节写入会很慢,但是对于写入大缓冲区,它却快了几分之一,因为它的开销较小(无论如何,fwrite最终都会在后台调用write).并且fwrite + fflush绝对更有效,因为这些最终可能是对write的两个或多个调用,如果直接对write进行调用,则只是一个.

write is a system call which transfers the data directly to kernel. It would be slow for writing byte-by-byte, but for writing large buffers it is tiny fraction faster because it has less overhead (fwrite ends up calling write under the hood anyway). And it is definitely more efficient that fwrite+fflush, because those may end up being two or more calls to write and if you do direct write, it is just one.

mmap 的文档不是很清楚,但是看来您所请求的字节数不得超过文件的实际字节数.

The documentation of mmap is not very clear about it, but it seems you must not request more bytes than the file actually has.

这篇关于使用mmap读取文件的同时写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆