如何在C ++中读取大文件 [英] How to read huge file in c++

查看:341
本文介绍了如何在C ++中读取大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个巨大的文件(例如1TB或不适合RAM的任何大小。该文件存储在磁盘上)。它由空间分隔。而且我的内存只有8GB。我可以在ifstream中读取该文件吗?如果没有,如何读取一个文件块(例如4GB)?

If I have a huge file (eg. 1TB, or any size that does not fit into RAM. The file is stored on the disk). It is delimited by space. And my RAM is only 8GB. Can I read that file in ifstream? If not, how to read a block of file (eg. 4GB)?

推荐答案

您可以做几件事

首先,打开大于您拥有的RAM数量的文件没有问题。您将无法将整个文件 live 复制到内存中。最好的办法是让您找到一次只读取几个块并进行处理的方法。您可以为此目的使用 ifstream (通过 ifstream.read )。分配一个兆字节的内存,将文件的第一个兆字节读入其中,然后冲洗并重复:

First, there's no problem opening a file that is larger than the amount of RAM that you have. What you won't be able to do is copy the whole file live into your memory. The best thing would be for you to find a way to read just a few chunks at a time and process them. You can use ifstream for that purpose (with ifstream.read, for instance). Allocate, say, one megabyte of memory, read the first megabyte of that file into it, rinse and repeat:

ifstream bigFile("mybigfile.dat");
constexpr size_t bufferSize = 1024 * 1024;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
    bigFile.read(buffer.get(), bufferSize);
    // process data in buffer
}

另一种解决方案是将文件到内存。大多数操作系统允许您将文件映射到内存,即使该文件大于您拥有的物理内存量也是如此。之所以起作用,是因为操作系统知道每个与文件相关联的内存页面都可以按需映射和取消映射:当程序需要特定页面时,操作系统会将其从文件中读取到进程的内存中,并换出一个页面。

Another solution is to map the file to memory. Most operating systems will allow you to map a file to memory even if it is larger than the physical amount of memory that you have. This works because the operating system knows that each memory page associated with the file can be mapped and unmapped on-demand: when your program needs a specific page, the OS will read it from the file into your process's memory and swap out a page that hasn't been used in a while.

但是,仅当文件小于理论上可以使用的最大内存量时,此方法才有效。这不是在64位进程中使用1TB文件的问题,但在32位进程中将不起作用。

However, this can only work if the file is smaller than the maximum amount of memory that your process can theoretically use. This isn't an issue with a 1TB file in a 64-bit process, but it wouldn't work in a 32-bit process.

请注意您要召唤的精神。内存映射文件与从文件读取内存是不同的。如果文件被另一个程序突然截断,则您的程序可能会崩溃。修改数据后,如果无法保存回磁盘,则可能会耗尽内存。另外,您的操作系统用于调入和调出内存的算法的运行方式可能无法为您带来明显的好处。由于存在这些不确定性,我将考虑仅在使用第一种解决方案以块形式读取文件时无法映射文件。

Also be aware of the spirits that you're summoning. Memory-mapping a file is not the same thing as reading from it. If the file is suddenly truncated from another program, your program is likely to crash. If you modify the data, it's possible that you will run out of memory if you can't save back to the disk. Also, your operating system's algorithm for paging in and out memory may not behave in a way that advantages you significantly. Because of these uncertainties, I would consider mapping the file only if reading it in chunks using the first solution cannot work.

在Linux / OS X上,您将使用 mmap 。在Windows上,您将打开一个文件,然后使用 CreateFileMapping 然后使用 MapViewOfFile

On Linux/OS X, you would use mmap for it. On Windows, you would open a file and then use CreateFileMapping then MapViewOfFile.

这篇关于如何在C ++中读取大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆