连接C ++中的两个大文件 [英] Concatenate two huge files in C++

查看:36
本文介绍了连接C ++中的两个大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个std :: ofstream文本文件,每个文本文件包含100多个meg,我想将它们串联起来.由于大小太大,使用fstreams存储数据以创建单个文件通常会出现内存不足错误.

I have two std::ofstream text files of a hundred plus megs each and I want to concatenate them. Using fstreams to store the data to create a single file usually ends up with an out of memory error because the size is too big.

有什么方法可以比O(n)更快地合并它们?

Is there any way of merging them faster than O(n)?

文件1(160MB):

File 1 (160MB):

0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659 

文件2(120MB):

File 2 (120MB):

a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a

合并(380MB):

0 1 3 5
7 9 11 13
...
...
9187653 9187655 9187657 9187659 
a b c d e f g h i j
a b c d e f g h j i
a b c d e f g i h j
a b c d e f g i j h
...
...
j i h g f e d c b a

文件生成:

std::ofstream a_file ( "file1.txt" );
std::ofstream b_file ( "file2.txt" );

    while(//whatever){
          a_file << num << endl;
    }

    while(//whatever){
          b_file << character << endl;
    }

    // merge them here, doesn't matter if output is one of them or a new file
    a_file.close();
    b_file.close();

推荐答案

假设您不想执行任何处理,而只想将两个文件串联起来就可以成为第三个文件,则可以非常简单地通过流式传输文件来完成此操作.'缓冲区:

Assuming you don't want to do any processing, and just want to concatenate two files to make a third, you can do this very simply by streaming the files' buffers:

std::ifstream if_a("a.txt", std::ios_base::binary);
std::ifstream if_b("b.txt", std::ios_base::binary);
std::ofstream of_c("c.txt", std::ios_base::binary);

of_c << if_a.rdbuf() << if_b.rdbuf();

过去,我尝试过使用高达100Mb的文件进行此类操作,并且没有任何问题.您可以有效地让C ++和库处理所需的任何缓冲.这也意味着,如果文件变大了,您就不必担心文件位置.

I have tried this sort of thing with files of up to 100Mb in the past and had no problems. You effectively let C++ and the libraries handle any buffering that's required. It also means that you don't need to worry about file positions if your files get really big.

另一种选择是,如果您只想将 b.txt 复制到 a.txt 的末尾,在这种情况下,您需要打开 a.txt 和附加标志,然后搜索到末尾:

An alternative is if you just wanted to copy b.txt onto the end of a.txt, in which case you would need to open a.txt with the append flag, and seek to the end:

std::ofstream of_a("a.txt", std::ios_base::binary | std::ios_base::app);
std::ifstream if_b("b.txt", std::ios_base::binary);

of_a.seekp(0, std::ios_base::end);
of_a << if_b.rdbuf();


这些方法的工作原理是将输入流的 std :: streambuf 传递到输出流的 operator<< ,其中的替代之一接受 streambuf 参数( operator<< ).如该链接中所述,在没有错误的情况下, streambuf 会以无格式插入到输出流中,直到文件结尾.


How these methods work is by passing the std::streambuf of the input streams to the operator<< of the output stream, one of the overrides of which takes a streambuf parameter (operator<<). As mentioned in that link, in the case where there are no errors, the streambuf is inserted unformatted into the output stream until the end of file.

这篇关于连接C ++中的两个大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆