逐行改善C ++的读取文件? [英] Improving C++'s reading file line by line?

查看:55
本文介绍了逐行改善C ++的读取文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析一个约500GB的日志文件,我的C ++版本需要3.5分钟,我的Go版本需要1.2分钟.

I am parsing a ~500GB log file and my C++ version takes 3.5 minutes and my Go version takes 1.2 minutes.

我正在使用C ++的流来流式传输要解析的文件的每一行.

I am using C++'s streams to stream each line of the file in to parse.

#include <fstream>
#include <string>
#include <iostream>

int main( int argc , char** argv ) {
   int linecount = 0 ;
   std::string line ;
   std::ifstream infile( argv[ 1 ] ) ;
   if ( infile ) {
      while ( getline( infile , line ) ) {
          linecount++ ;
      }
      std::cout << linecount << ": " << line << '\n' ;
   }
   infile.close( ) ;
   return 0 ;
}

首先,为什么使用此代码这么慢? 其次,如何改进它以使其更快?

Firstly, why is it so slow to use this code? Secondly, how can I improve it to make it faster?

推荐答案

众所周知,C ++标准库iostreams的运行速度很慢,并且标准库的所有不同实现都是这种情况.为什么?因为该标准对实现施加了很多要求,从而阻碍了最佳性能.标准库的这一部分是大约20年前设计的,在高性能基准测试上并没有真正的竞争力.

The C++ standard libraries iostreams are notoriously slow and this is the case for all different implementations of the standard library. Why? Because the standard imposes lots of requirements on the implementation which inhibit best performance. This part of the standard library was designed roughly 20 years ago and is not really competitive on high performance benchmarks.

如何避免这种情况?将其他库用于高性能异步I/O,例如boost asio或操作系统提供的本机功能.

How can you avoid it? Use other libraries for high performance async I/O like boost asio or native functions that are provided by your OS.

如果要保持在标准范围内,功能std::basic_istream::read()可能会满足您的性能要求.但是在这种情况下,您必须自己进行缓冲和计数.这是可以完成的方法.

If you want to stay within the standard, the functionstd::basic_istream::read() may satisfy your performance demands. But you have to do your buffering and line counting yourself in this case. Here's how it can be done.

#include <algorithm>
#include <fstream>
#include <iostream>
#include <vector>

int main( int, char** argv ) {
   int linecount = 1 ;
   std::vector<char> buffer;
   buffer.resize(1000000); // buffer of 1MB size
   std::ifstream infile( argv[ 1 ] ) ;
   while (infile)
   {
       infile.read( buffer.data(), buffer.size() );
       linecount += std::count( buffer.begin(), 
                                buffer.begin() + infile.gcount(), '\n' );
   }
   std::cout << "linecount: " << linecount << '\n' ;
   return 0 ;
}

如果速度更快,请告诉我!

Let me know, if it's faster!

这篇关于逐行改善C ++的读取文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆