将数据从C ++ Vector快速写入文本文件 [英] Write data from C++ Vector to text file fast

查看:1031
本文介绍了将数据从C ++ Vector快速写入文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道将数据从向量< string> 写入文本文件的最佳方法,因为数据涉及到几百万行。

I would like to know the best way to write data from vector<string> to text file fast as the data would involve few millions lines.

我试过在C ++中的streamstream(<<)以及使用C的fprintf,但是它们之间的性能很小,因为我记录了用于生成所需档案。

I have tried ofstream (<<) in C++ as well as fprintf using C, yet, the performance between them is little as i have recorded the time that is used to generate the required file.

vector<string> OBJdata;

OBJdata = assembleOBJ(pointer, vertexCount, facePointer);

FILE * objOutput;
objOutput = fopen("sample.obj", "wt");
for (int i = 0; i < OBJdata.size(); i++)
{
    fwrite(&OBJdata[i],1, sizeof(OBJdata[i].length()),objOutput );
}
fclose(objOutput);


推荐答案

没有最好。只有具有不同优点和缺点的选项,这两个选项随您的主机硬件而异(例如,写入高性能驱动器的速度会快于慢速),文件系统和设备驱动程序(磁盘驱动程序的实现可以权衡性能,以增加数据正确写入驱动器的机会)。

There is no "best". There are only options with different advantages and disadvantages, both of which vary with your host hardware (e.g. writing to a high performance drive will be faster than a slower on), file system, and device drivers (implementation of disk drivers can trade-off performance to increase chances of data being correctly written to the drive).

但是,通常,操作内存中的数据比将它转移到或从一个设备驾驶。这有一些限制,因为对于虚拟内存,物理内存中的数据在某些情况下可能会传输到磁盘上的虚拟内存。

Generally, however, manipulating data in memory is faster than transferring it to or from a device like a hard drive. There are limitations on this as, with virtual memory, data in physical memory may be transferred in some circumstances to virtual memory - on disk.

因此,假设您有足够的内存和一个快速CPU,一种方法像

So, assuming you have sufficient RAM and a fast CPU, an approach like

 // assume your_stream is an object of type derived from ostream

 //   THRESHOLD is a large-ish positive integer

std::string buffer;
buffer.reserve(THRESHOLD);
for (std::vector<string>::const_iterator i = yourvec.begin(), end = yourvec.end(); i != end; ++i)
{
     if (buffer.length() + i->length + 1 >= THRESHOLD)
     {
          your_stream << buffer;
          buffer.resize(0);
     }
     buffer.append(*i);
     buffer.append(1, '\n');
}
your_stream << buffer;

这里的策略是减少写入流的不同操作的数量。作为经验法则,THRESHOLD的较大值将减少不同输出操作的数量,但也将消耗更多的内存,因此在性能方面通常有一个最佳点。问题是,甜蜜点取决于我上面提到的因素(硬件,文件系统,设备驱动程序等)。所以这种方法值得一些努力找到甜蜜点,只有当你知道你的程序将运行的确切的硬件和主机系统配置(或你知道该程序将只在一小范围的配置执行)。如果你不知道这些东西,这是不值得的努力,因为对一个配置的工作通常不会为另一个配置工作。

The strategy here is reducing the number of distinct operations that write to the stream. As a rule of thumb, a larger value of THRESHOLD will reduce the number of distinct output operations, but will also consume more memory, so there is usually a sweet spot somewhere in terms of performance. The problem is, that sweet spot depends on the factors I mentioned above (hardware, file system, device drivers, etc). So this approach is worth some effort to find the sweet spot only if you KNOW the exact hardware and host system configuration your program will run on (or you KNOW that the program will only be executed in a small range of configurations). It is not worth the effort if you don't know these things, since what works with one configuration will often not work for another.

在windows下,你可能想要使用win API函数来处理文件(CreateFile(),WriteFile()等),而不是C ++流。这可能会带来小的性能提升,但我不会喘口气。

Under windows, you might want to use win API functions to work with the file (CreateFile(), WriteFile(), etc) rather than C++ streams. That might give small performance gains, but I wouldn't hold my breath.

这篇关于将数据从C ++ Vector快速写入文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆