我怎样才能加快逐行阅读一个ASCII文件? (C ++) [英] How can I speed up line by line reading of an ASCII file? (C++)

查看:211
本文介绍了我怎样才能加快逐行阅读一个ASCII文件? (C ++)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  // --------这是一段代码, -------------------------------------------------- --------------------- 
//从字典文件
构造字典哈希集// ---------- -------------------------------------------------- -----------------
void constructDictionary(unordered_set< string>& dict)
{
ifstream wordListFile;
wordListFile.open(dictionary.txt);

std :: string word; (wordListFile>> word)
($!
if(!word.empty())
{
dict.insert(word);
}
}

wordListFile.close();

$ / code>

我正在阅读约200,000字,这大约需要240 ms机。在这里使用 ifstream 有效吗?我可以做得更好吗?我正在阅读关于 mmap()的实现,但是我没有100%的理解它们。输入文件只是带有* nix行结束符的文本字符串。

编辑:后续问题 b将任何替代(减去增加流缓冲区大小)意味着我写了一个解析器,检查每个字符的新行?我喜欢流的简单语法,但是如果我必须要速度的话,我可以重写更多的东西。阅读整个文件到内存是一个可行的选择,它只有约2mb。



编辑#2:我发现慢对我来说是由于设置插入,但对于那些仍然有兴趣加快逐行文件IO,请阅读这里的答案,并检查出 Matthieu M.在这个话题上的继续

解决方案

在我的系统(linux-2.6.37,gcc-4.5.2,用-O3编译)上快速分析表明I / O不是瓶颈。无论是使用 fscanf 进入一个char数组,紧接着是dict.insert()或者运算符>> ,需要大约相同的时间(155 - 160毫秒来读取一个240K字文件)。

替换gcc的 std :: unordered_set std :: vector< std :: string> 在代码中将执行时间减少到45 ms( fscanf ) - 55 ms( operator>> )。尝试分析IO并单独设置插入。


Here's a bit of code that is a considerable bottleneck after doing some measuring:

//-----------------------------------------------------------------------------
//  Construct dictionary hash set from dictionary file
//-----------------------------------------------------------------------------
void constructDictionary(unordered_set<string> &dict)
{
    ifstream wordListFile;
    wordListFile.open("dictionary.txt");

    std::string word;
    while( wordListFile >> word )
    {
        if( !word.empty() )
        {
            dict.insert(word);
        }
    }

    wordListFile.close();
}

I'm reading in ~200,000 words and this takes about 240 ms on my machine. Is the use of ifstream here efficient? Can I do better? I'm reading about mmap() implementations but I'm not understanding them 100%. The input file is simply text strings with *nix line terminations.

EDIT: Follow-up question for the alternatives being suggested: Would any alternative (minus increasing the stream buffer sizes) imply that I write a parser that examines each character for new-lines? I kind of like the simple syntax of streams, but I can re-write something more nitty-gritty if I have to for speed. Reading the entire file in to memory is a viable option, it's only about 2mb.

EDIT #2: I've found that the slow down for me was due to the set insert, but for those who are still interested in speeding up line by line file IO, please read the answers here AND check out Matthieu M.'s continuation on the topic.

解决方案

Quick profiling on my system (linux-2.6.37, gcc-4.5.2, compiled with -O3) shows that I/O is not the bottleneck. Whether using fscanf into a char array followed by dict.insert() or operator>> as in your exact code, it takes about the same time (155 - 160 ms to read a 240k word file).

Replacing gcc's std::unordered_set with std::vector<std::string> in your code drops the execution time to 45 ms (fscanf) - 55 ms (operator>>) for me. Try to profile IO and set insertion separately.

这篇关于我怎样才能加快逐行阅读一个ASCII文件? (C ++)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆