mmap()用于大文件I / O? [英] mmap() for large file I/O?

查看:622
本文介绍了mmap()用于大文件I / O?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C ++中创建一个在Linux上运行的实用程序,可以将视频转换为专有格式。视频帧非常大(高达1600万像素),我们需要能够直接寻找准确的帧编号,因此我们的文件格式使用libz单独压缩每个帧,并将压缩数据附加到文件。一旦所有帧都完成写入,包括每个帧的元数据(包括它们的文件偏移和大小)的日志被写到文件的末尾。

I'm creating a utility in C++ to be run on Linux which can convert videos to a proprietary format. The video frames are very large (up to 16 megapixels), and we need to be able to seek directly to exact frame numbers, so our file format uses libz to compress each frame individually, and append the compressed data onto a file. Once all frames are finished being written, a journal which includes meta data for each frame (including their file offsets and sizes) is written to the end of the file.

我目前使用ifstream和ofstream做文件i / o,但我正在尽可能优化。我听说mmap()可以提高性能在很多情况下,我想知道我的是否是其中之一。我们的文件将在几十到几百千兆字节,虽然写将总是顺序完成,随机访问读取应该在恒定时间。

I'm currently using ifstream and ofstream to do the file i/o, but I am looking to optimize as much as possible. I've heard that mmap() can increase performance in a lot of cases, and I'm wondering if mine is one of them. Our files will be in the tens to hundreds of gigabytes, and although writing will always be done sequentially, random access reads should be done in constant time. Any thoughts as to whether I should investigate this further, and if so does anyone have any tips for things to look out for?

谢谢!

推荐答案

在32位计算机上,您的进程限制为2-3 GB的用户地址空间。这意味着(允许其他内存使用),您将无法一次映射超过〜1 GB的文件。这样做 NOT 意味着您不能对非常大的文件使用 mmap() - 只需要一次只映射部分文件。

On a 32-bit machine your process is limited to 2-3 GB of user address space. This means that (allowing for other memory use) you won't be able to map more than ~1 GB of your file at a time. This does NOT mean that you cannot use mmap() for very large files - just that you need to map only part of the file at a time.

话虽如此, mmap()对于大文件来说还是一个大胜利。最显着的优点是,您不会浪费内存来保存数据 TWICE - 系统缓存中的一个副本,应用程序的专用缓冲区中的一个副本以及CPU时间。它可以是随机访问的更大的加速 - 但是随机部分必须限制在当前映射的范围内。

That being said, mmap() can still be a large win for large files. The most significant advantage is that you don't waste memory for keeping the data TWICE - one copy in the system cache, one copy in a private buffer of your application - and CPU time to make those copies. It can be an even more major speedup for random access - but the "random" part must be limited in range to your current mapping(s).

这篇关于mmap()用于大文件I / O?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆