在C ++中加快从文件中读取整数的速度 [英] Speed up integer reading from file in C++

查看:163
本文介绍了在C ++中加快从文件中读取整数的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在逐行读取文件,并从中提取整数.一些值得注意的要点:

I'm reading a file, line by line, and extracting integers from it. Some noteworthy points:

  • 输入文件不是二进制文件;
  • 我无法将整个文件加载到内存中;
  • 文件格式(仅整数,用一些定界符分隔):

  • the input file is not in binary;
  • I cannot load up the whole file in memory;
  • file format (only integers, separated by some delimiter):

x1 x2 x3 x4 ...
y1 y2 y3 ...
z1 z2 z3 z4 z5 ...
...

只是添加上下文,我正在读取整数,并使用std::unordered_map<unsigned int, unsinged int>对其进行计数.

Just to add context, I'm reading the integers, and counting them, using an std::unordered_map<unsigned int, unsinged int>.

只需遍历行并分配无用的字符串流,如下所示:

Simply looping through lines, and allocating useless stringstreams, like this:

std::fstream infile(<inpath>, std::ios::in);
while (std::getline(infile, line)) {
    std::stringstream ss(line);
}

为我提供700MB文件的大约2.7秒.

gives me ~2.7s for a 700MB file.

解析每一行:

unsigned int item;
std::fstream infile(<inpath>, std::ios::in);
while (std::getline(infile, line)) {
    std::stringstream ss(line);
    while (ss >> item);
}

给我约17.8秒的时间.

Gives me ~17.8s for the same file.

如果我将运算符更改为std::getline + atoi:

If I change the operator to a std::getline + atoi:

unsigned int item;
std::fstream infile(<inpath>, std::ios::in);
while (std::getline(infile, line)) {
    std::stringstream ss(line);
    while (std::getline(ss, token, ' ')) item = atoi(token.c_str());
}

大约14.6秒.

有没有比这些方法更快的方法?我认为没有必要加快文件读取速度,只需解析自身即可-两者都不会造成任何危害,尽管(:

Is there anything faster than these approaches? I don't think it's necessary to speed up the file reading, just the parsing itself -- both wouldn't make no harm, though (:

推荐答案

该程序

#include <iostream>
int main ()
{
    int num;
    while (std::cin >> num) ;
}

大约需要17秒才能读取文件.这段代码

needs about 17 seconds to read a file. This code

#include <iostream>   
int main()
{
    int lc = 0;
    int item = 0;
    char buf[2048];
    do
    {
        std::cin.read(buf, sizeof(buf));
        int k = std::cin.gcount();
        for (int i = 0; i < k; ++i)
        {
            switch (buf[i])
            {
                case '\r':
                    break;
                case '\n':
                    item = 0; lc++;
                    break;
                case ' ':
                    item = 0;
                    break;
                case '0': case '1': case '2': case '3':
                case '4': case '5': case '6': case '7':
                case '8': case '9':
                    item = 10*item + buf[i] - '0';
                    break;
                default:
                    std::cerr << "Bad format\n";
            }    
        }
    } while (std::cin);
}

同一文件需要1.25秒.随心所欲...

needs 1.25 seconds for the same file. Make what you want of it...

这篇关于在C ++中加快从文件中读取整数的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆