如何提高我的C ++程序读取分隔的文本文件的速度? [英] How to enhance the speed of my C++ program in reading delimited text files?

查看:145
本文介绍了如何提高我的C ++程序读取分隔的文本文件的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我向您展示执行相同作业的C#和C ++代码:读取由|分隔的相同文本文件,并用#分隔文本保存。





更新1:感谢Seth(编译:cl / EHsc / Ox / Ob2 / Oi) GWW用于改变线绳在环外的位置,经过的时间减少到53秒。我更新了代码。



更新2:你有什么其他建议来启用C ++代码?



当我执行C#程序时,经过的时间是34秒!



问题是,如何提高C ++的速度与C# p>

C ++程序:

  int main()
{
Timer t;
cout<< t.ShowStart()<< endl;

ifstream input(in.txt);
ofstream output(out.txt,ios :: out);
char const row_delim ='\\\
';
char const field_delim ='|';
string s1,s2;

while(输入)
{
if(!getline(input,s1,row_delim))
break;
istringstream iss(s1);
while(iss)
{
if(!getline(iss,s2,field_delim))
break;
output<< s2 < #;
}
输出<< \\\
;
}

t.Stop();
cout<< t.ShowEnd()<< endl;
cout<< Executed in:<< t.ElapsedSeconds()<< 秒。 << endl;
return 0;
}

C#程序:

  static void Main(string [] args)
{
long i;
秒表sw = new Stopwatch();
Console.WriteLine(DateTime.Now);
sw.Start();
StreamReader sr = new StreamReader(in.txt,Encoding.Default);
StreamWriter wr = new StreamWriter(out.txt,false,Encoding.Default);
object [] cols = new object [0]; //在填充
时自动分配更多的元素string line;
while(!string.Equals(line = sr.ReadLine(),null))//最快的方式
{
cols = line.Split('|'); //比使用列表<>更快
foreach(cols中的对象col)
wr.Write(col +#);
wr.WriteLine();
}
sw.Stop();
Console.WriteLine(Conteotomó{0} secs,sw.Elapsed);
Console.WriteLine(DateTime.Now);
}



更新3:



那么,我必须说我对收到的帮助非常高兴,因为我的问题的答案已经满足了。



我改变了问题的文本



为编译命令保留Seth指示(即cl / EHsc / Ox / Ob2 / Oi pgm.cpp):



Bo Persson的解决方案平均需要18秒来完成执行,真的是一个好的,考虑到代码是接近我喜欢的)。



Molbdlino解决方案平均花费6秒钟,真的太棒了! (感谢康斯坦丁也)。



不要太晚学习,我用我的问题学到了宝贵的东西。



我最好的问候。

解决方案

正如康斯坦丁建议,使用 / code>。



我将一个包含5百万个条目(每个26个字节)的129M文件在100,000行中将时间从〜25s切换到〜3s。

  #include< iostream> 
#include< fstream>
#include< sstream>
#include< algorithm>

using namespace std;

int main()
{
ifstream input(in.txt);
ofstream output(out.txt,ios :: out);

const size_t size = 512 * 1024;
char buffer [size];

while(input){
input.read(buffer,size);
size_t readBytes = input.gcount();
replace(buffer,buffer + readBytes,'|','#');
output.write(buffer,readBytes);
}
input.close();
output.close();

return 0;
}


I show you C# and C++ code that execute the same job: to read the same text file delimited by "|" and save with "#" delimited text.

When I execute C++ program, the time elapsed is 169 seconds.

UPDATE 1: Thanks to Seth (compilation with: cl /EHsc /Ox /Ob2 /Oi) and GWW for changing the positions of string s outside the loops, the elapsed time was reduced to 53 seconds. I updated the code also.

UPDATE 2: Do you have any other suggestion to enhace the C++ code?

When I execute the C# program, the elapsed time is 34 seconds!

The question is, how can I enhance the speed of C++ comparing with the C# one?

C++ Program:

int main ()
{
    Timer t;
    cout << t.ShowStart() << endl;

    ifstream input("in.txt");
    ofstream output("out.txt", ios::out);
    char const row_delim = '\n';
    char const field_delim = '|';
    string s1, s2;

    while (input)
    {
        if (!getline( input, s1, row_delim ))
            break;
        istringstream iss(s1);
        while (iss)
        {
            if (!getline(iss, s2, field_delim ))
                break;
            output << s2 << "#";
        }
        output << "\n";
    }

    t.Stop();
    cout << t.ShowEnd() << endl;
    cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl;
    return 0;
}

C# program:

    static void Main(string[] args)
    {
        long i;
        Stopwatch sw = new Stopwatch();
        Console.WriteLine(DateTime.Now);
        sw.Start();
        StreamReader sr = new StreamReader("in.txt", Encoding.Default);
        StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default);
        object[] cols = new object[0];  // allocates more elements automatically when filling
        string line;
        while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way
        {
        cols = line.Split('|');  // Faster than using a List<>
        foreach (object col in cols)
            wr.Write(col + "#");
        wr.WriteLine();
        }
        sw.Stop();
        Console.WriteLine("Conteo tomó {0} secs", sw.Elapsed);
        Console.WriteLine(DateTime.Now);
    }

UPDATE 3:

Well, I must say I am very happy for the help received and because the answer to my question has been satisfied.

I changed the text of the question a little to be more specific, and I tested the solutions that kindly raised Molbdlino and Bo Persson.

Keeping Seth indications for the compile command (i.e. cl /EHsc /Ox /Ob2 /Oi pgm.cpp):

Bo Persson's solution took 18 seconds on average to complete the execution, really a good one taking in account that the code is near to what I like).

Molbdlino solution took 6 seconds on average, really amazing!! (thanks to Constantine also).

Never too late to learn, and I learned valuable things with my question.

My best regards.

解决方案

As Constantine suggests, read large chunks at a time using read.

I cut the time from ~25s to ~3s on a 129M file with 5M "entries" (26 bytes each) in 100,000 lines.

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>

using namespace std;

int main ()
{
  ifstream input("in.txt");
  ofstream output("out.txt", ios::out);

  const size_t size = 512 * 1024;
  char buffer[size];

  while (input) {
    input.read(buffer, size);
    size_t readBytes = input.gcount();
    replace(buffer, buffer+readBytes, '|', '#');
    output.write(buffer, readBytes);
  }
  input.close();
  output.close();

  return 0;
}

这篇关于如何提高我的C ++程序读取分隔的文本文件的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆