如何在 32 位系统上读取 4GB 文件 [英] How to read 4GB file on 32bit system

查看:28
本文介绍了如何在 32 位系统上读取 4GB 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就我而言,我有不同的文件,假设我有 > 4GB 的数据文件.我想逐行读取该文件并处理每一行.我的限制之一是必须在 32 位 MS Windows 或具有少量 RAM(最少 4GB)的 64 位上运行 soft.您还可以假设这些行的处理不是瓶颈.

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM (min 4GB). You can also assume that processing of these lines isn't bottleneck.

在当前的解决方案中,我通过 ifstream 读取该文件并复制到某个字符串.这是它的外观片段.

In current solution I read that file by ifstream and copy to some string. Here is snippet how it looks like.

std::ifstream file(filename_xml.c_str());
uintmax_t m_numLines = 0;
std::string str;
while (std::getline(file, str))
{
    m_numLines++;
}

好吧,这很管用,但这里是我 3.6 GB 数据的时间:

And ok, that's working but to slowly here is a time for my 3.6 GB of data:

real    1m4.155s
user    0m0.000s
sys     0m0.030s

我正在寻找一种比该方法快得多的方法,例如我发现 如何在 C++ 中快速解析空格分隔的浮点数? 我喜欢用 boost::mapped_file 提出的解决方案,但我面临另一个问题,如果我的文件太大怎么办在我的情况下,1GB 大的文件足以删除整个过程.我必须关心内存中的当前数据,可能会使用该工具的人安装的 RAM 不超过 4 GB.

I'm looking for a method that will be much faster than that for example I found that How to parse space-separated floats in C++ quickly? and I loved presented solution with boost::mapped_file but I faced to another problem what if my file is to big and in my case file 1GB large was enough to drop entire process. I have to care about current data in memory probably people who will be using that tool doesn't have more than 4 GB installed RAM.

所以我从 boost 中找到了mapped_file,但在我的情况下如何使用它?是否可以部分读取该文件并接收这些行?

So I found that mapped_file from boost but how to use it in my case? Is it possible to read partially that file and receive these lines?

也许你有另一个更好的解决方案.我只需要处理每一行.

Maybe you have another much better solution. I have to just process each line.

谢谢,
巴特

推荐答案

很高兴看到您在 如何在 C++ 中快速解析空格分隔的浮点数?

看来您真的在寻找计算行数(或任何线性单遍分析)的最快方法,我已经在这里进行了类似的分析和基准测试

It seems you're really looking for the fastest way to count lines (or any linear single pass analysis), I've done a similar analysis and benchmark of exactly that here

有趣的是,您会看到性能最高的代码根本不需要依赖内存映射.

Interestingly, you'll see that the most performant code does not need to rely on memory mapping at all there.

static uintmax_t wc(char const *fname)
{
    static const auto BUFFER_SIZE = 16*1024;
    int fd = open(fname, O_RDONLY);
    if(fd == -1)
        handle_error("open");

    /* Advise the kernel of our access pattern.  */
    posix_fadvise(fd, 0, 0, 1);  // FDADVICE_SEQUENTIAL

    char buf[BUFFER_SIZE + 1];
    uintmax_t lines = 0;

    while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
    {
        if(bytes_read == (size_t)-1)
            handle_error("read failed");
        if (!bytes_read)
            break;

        for(char *p = buf; (p = (char*) memchr(p, '
', (buf + bytes_read) - p)); ++p)
            ++lines;
    }

    return lines;
}

这篇关于如何在 32 位系统上读取 4GB 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆