如何读取32位系统上有4GB的文件 [英] How to read 4GB file on 32bit system

查看:177
本文介绍了如何读取32位系统上有4GB的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我来说,我有不同的文件让我们假设我有>数据4GB的文件。我想读通过行文件中的行和处理每一行。我的一个限制是软的,必须在32位MS Windows或与少量的RAM(4GB分钟),64位运行。你也可以假设这些行的处理不瓶颈。

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM (min 4GB). You can also assume that processing of these lines isn't bottleneck.

在目前的解决方案我读过 ifstream的该文件并复制到一些字符串。下面是片段它的样子。

In current solution I read that file by ifstream and copy to some string. Here is snippet how it looks like.

std::ifstream file(filename_xml.c_str());
uintmax_t m_numLines = 0;
std::string str;
while (std::getline(file, str))
{
    m_numLines++;
}

和好了,这是工作,但慢慢地这里是我的3.6 GB数据的时间:

And ok, that's working but to slowly here is a time for my 3.6 GB of data:

real    1m4.155s
user    0m0.000s
sys     0m0.030s

我在寻找一种方法,这将是比快多了比如我发现,<一个href=\"http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702#17479702\">How解析空格分隔的花车在C ++快?,我喜欢presented用的boost :: mapped_file所解决方案,但我遇到了另一个问题,如果我有什么文件到大,在我的案件档案1GB超大足以降全过程。我关心当前内存中的数据可能谁将会使用该工具的人没有超过4 GB的已安装的RAM。

I'm looking for a method that will be much faster than that for example I found that How to parse space-separated floats in C++ quickly? and I loved presented solution with boost::mapped_file but I faced to another problem what if my file is to big and in my case file 1GB large was enough to drop entire process. I have to care about current data in memory probably people who will be using that tool doesn't have more than 4 GB installed RAM.

所以,我发现从boost的mapped_file所,但如何在我的情况下使用呢?是否有可能部分地读取该文件,并接收这些行?

So I found that mapped_file from boost but how to use it in my case? Is it possible to read partially that file and receive these lines?

也许你有另一种更好的解决方案。我只是处理每一行。

Maybe you have another much better solution. I have to just process each line.

谢谢,

巴特

Thanks,
Bart

推荐答案

很高兴见到你发现我在基准<一href=\"http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702#17479702\">How解析空格分隔的花车在C ++快?

Nice to see you found my benchmark at How to parse space-separated floats in C++ quickly?

看来你真的想找计算行(或任何线性单通分析)最快的方法,我做的正是在这里类似的分析和标杆

It seems you're really looking for the fastest way to count lines (or any linear single pass analysis), I've done a similar analysis and benchmark of exactly that here

  • Fast textfile reading in c++

有趣的是,你会看到最高效的code并不需要依靠内存映射在都在那里。

Interestingly, you'll see that the most performant code does not need to rely on memory mapping at all there.

static uintmax_t wc(char const *fname)
{
    static const auto BUFFER_SIZE = 16*1024;
    int fd = open(fname, O_RDONLY);
    if(fd == -1)
        handle_error("open");

    /* Advise the kernel of our access pattern.  */
    posix_fadvise(fd, 0, 0, 1);  // FDADVICE_SEQUENTIAL

    char buf[BUFFER_SIZE + 1];
    uintmax_t lines = 0;

    while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
    {
        if(bytes_read == (size_t)-1)
            handle_error("read failed");
        if (!bytes_read)
            break;

        for(char *p = buf; (p = (char*) memchr(p, '\n', (buf + bytes_read) - p)); ++p)
            ++lines;
    }

    return lines;
}

这篇关于如何读取32位系统上有4GB的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆