如何读取32位系统上有4GB的文件 [英] How to read 4GB file on 32bit system
问题描述
在我来说,我有不同的文件让我们假设我有>数据4GB的文件。我想读通过行文件中的行和处理每一行。我的一个限制是软的,必须在32位MS Windows或与少量的RAM(4GB分钟),64位运行。你也可以假设这些行的处理不瓶颈。
In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM (min 4GB). You can also assume that processing of these lines isn't bottleneck.
在目前的解决方案我读过 ifstream的
该文件并复制到一些字符串。下面是片段它的样子。
In current solution I read that file by ifstream
and copy to some string. Here is snippet how it looks like.
std::ifstream file(filename_xml.c_str());
uintmax_t m_numLines = 0;
std::string str;
while (std::getline(file, str))
{
m_numLines++;
}
和好了,这是工作,但慢慢地这里是我的3.6 GB数据的时间:
And ok, that's working but to slowly here is a time for my 3.6 GB of data:
real 1m4.155s
user 0m0.000s
sys 0m0.030s
我在寻找一种方法,这将是比快多了比如我发现,<一个href=\"http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702#17479702\">How解析空格分隔的花车在C ++快?,我喜欢presented用的boost :: mapped_file所解决方案,但我遇到了另一个问题,如果我有什么文件到大,在我的案件档案1GB超大足以降全过程。我关心当前内存中的数据可能谁将会使用该工具的人没有超过4 GB的已安装的RAM。
I'm looking for a method that will be much faster than that for example I found that How to parse space-separated floats in C++ quickly? and I loved presented solution with boost::mapped_file but I faced to another problem what if my file is to big and in my case file 1GB large was enough to drop entire process. I have to care about current data in memory probably people who will be using that tool doesn't have more than 4 GB installed RAM.
所以,我发现从boost的mapped_file所,但如何在我的情况下使用呢?是否有可能部分地读取该文件,并接收这些行?
So I found that mapped_file from boost but how to use it in my case? Is it possible to read partially that file and receive these lines?
也许你有另一种更好的解决方案。我只是处理每一行。
Maybe you have another much better solution. I have to just process each line.
谢谢,
巴特
Thanks,
Bart
推荐答案
很高兴见到你发现我在基准<一href=\"http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702#17479702\">How解析空格分隔的花车在C ++快?
Nice to see you found my benchmark at How to parse space-separated floats in C++ quickly?
看来你真的想找计算行(或任何线性单通分析)最快的方法,我做的正是在这里类似的分析和标杆
It seems you're really looking for the fastest way to count lines (or any linear single pass analysis), I've done a similar analysis and benchmark of exactly that here
- Fast textfile reading in c++
有趣的是,你会看到最高效的code并不需要依靠内存映射在都在那里。
Interestingly, you'll see that the most performant code does not need to rely on memory mapping at all there.
static uintmax_t wc(char const *fname)
{
static const auto BUFFER_SIZE = 16*1024;
int fd = open(fname, O_RDONLY);
if(fd == -1)
handle_error("open");
/* Advise the kernel of our access pattern. */
posix_fadvise(fd, 0, 0, 1); // FDADVICE_SEQUENTIAL
char buf[BUFFER_SIZE + 1];
uintmax_t lines = 0;
while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
{
if(bytes_read == (size_t)-1)
handle_error("read failed");
if (!bytes_read)
break;
for(char *p = buf; (p = (char*) memchr(p, '\n', (buf + bytes_read) - p)); ++p)
++lines;
}
return lines;
}
这篇关于如何读取32位系统上有4GB的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!