如何提高我的C ++程序读取分隔的文本文件的速度? [英] How to enhance the speed of my C++ program in reading delimited text files?
问题描述
我向您展示执行相同作业的C#和C ++代码:读取由|分隔的相同文本文件,并用#分隔文本保存。
更新1:感谢Seth(编译:cl / EHsc / Ox / Ob2 / Oi) GWW用于改变线绳在环外的位置,经过的时间减少到53秒。我更新了代码。
更新2:你有什么其他建议来启用C ++代码?
当我执行C#程序时,经过的时间是34秒!
问题是,如何提高C ++的速度与C# p>
C ++程序:
int main()
{
Timer t;
cout<< t.ShowStart()<< endl;
ifstream input(in.txt);
ofstream output(out.txt,ios :: out);
char const row_delim ='\\\
';
char const field_delim ='|';
string s1,s2;
while(输入)
{
if(!getline(input,s1,row_delim))
break;
istringstream iss(s1);
while(iss)
{
if(!getline(iss,s2,field_delim))
break;
output<< s2 < #;
}
输出<< \\\
;
}
t.Stop();
cout<< t.ShowEnd()<< endl;
cout<< Executed in:<< t.ElapsedSeconds()<< 秒。 << endl;
return 0;
}
C#程序:
static void Main(string [] args)
{
long i;
秒表sw = new Stopwatch();
Console.WriteLine(DateTime.Now);
sw.Start();
StreamReader sr = new StreamReader(in.txt,Encoding.Default);
StreamWriter wr = new StreamWriter(out.txt,false,Encoding.Default);
object [] cols = new object [0]; //在填充
时自动分配更多的元素string line;
while(!string.Equals(line = sr.ReadLine(),null))//最快的方式
{
cols = line.Split('|'); //比使用列表<>更快
foreach(cols中的对象col)
wr.Write(col +#);
wr.WriteLine();
}
sw.Stop();
Console.WriteLine(Conteotomó{0} secs,sw.Elapsed);
Console.WriteLine(DateTime.Now);
}
更新3:
那么,我必须说我对收到的帮助非常高兴,因为我的问题的答案已经满足了。
我改变了问题的文本
为编译命令保留Seth指示(即cl / EHsc / Ox / Ob2 / Oi pgm.cpp):
Bo Persson的解决方案平均需要18秒来完成执行,真的是一个好的,考虑到代码是接近我喜欢的)。
Molbdlino解决方案平均花费6秒钟,真的太棒了! (感谢康斯坦丁也)。
不要太晚学习,我用我的问题学到了宝贵的东西。
我最好的问候。
正如康斯坦丁建议,使用 / code>。
我将一个包含5百万个条目(每个26个字节)的129M文件在100,000行中将时间从〜25s切换到〜3s。
#include< iostream>
#include< fstream>
#include< sstream>
#include< algorithm>
using namespace std;
int main()
{
ifstream input(in.txt);
ofstream output(out.txt,ios :: out);
const size_t size = 512 * 1024;
char buffer [size];
while(input){
input.read(buffer,size);
size_t readBytes = input.gcount();
replace(buffer,buffer + readBytes,'|','#');
output.write(buffer,readBytes);
}
input.close();
output.close();
return 0;
}
I show you C# and C++ code that execute the same job: to read the same text file delimited by "|" and save with "#" delimited text.
When I execute C++ program, the time elapsed is 169 seconds.
UPDATE 1: Thanks to Seth (compilation with: cl /EHsc /Ox /Ob2 /Oi) and GWW for changing the positions of string s outside the loops, the elapsed time was reduced to 53 seconds. I updated the code also.
UPDATE 2: Do you have any other suggestion to enhace the C++ code?
When I execute the C# program, the elapsed time is 34 seconds!
The question is, how can I enhance the speed of C++ comparing with the C# one?
C++ Program:
int main ()
{
Timer t;
cout << t.ShowStart() << endl;
ifstream input("in.txt");
ofstream output("out.txt", ios::out);
char const row_delim = '\n';
char const field_delim = '|';
string s1, s2;
while (input)
{
if (!getline( input, s1, row_delim ))
break;
istringstream iss(s1);
while (iss)
{
if (!getline(iss, s2, field_delim ))
break;
output << s2 << "#";
}
output << "\n";
}
t.Stop();
cout << t.ShowEnd() << endl;
cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl;
return 0;
}
C# program:
static void Main(string[] args)
{
long i;
Stopwatch sw = new Stopwatch();
Console.WriteLine(DateTime.Now);
sw.Start();
StreamReader sr = new StreamReader("in.txt", Encoding.Default);
StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default);
object[] cols = new object[0]; // allocates more elements automatically when filling
string line;
while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way
{
cols = line.Split('|'); // Faster than using a List<>
foreach (object col in cols)
wr.Write(col + "#");
wr.WriteLine();
}
sw.Stop();
Console.WriteLine("Conteo tomó {0} secs", sw.Elapsed);
Console.WriteLine(DateTime.Now);
}
UPDATE 3:
Well, I must say I am very happy for the help received and because the answer to my question has been satisfied.
I changed the text of the question a little to be more specific, and I tested the solutions that kindly raised Molbdlino and Bo Persson.
Keeping Seth indications for the compile command (i.e. cl /EHsc /Ox /Ob2 /Oi pgm.cpp):
Bo Persson's solution took 18 seconds on average to complete the execution, really a good one taking in account that the code is near to what I like).
Molbdlino solution took 6 seconds on average, really amazing!! (thanks to Constantine also).
Never too late to learn, and I learned valuable things with my question.
My best regards.
As Constantine suggests, read large chunks at a time using read
.
I cut the time from ~25s to ~3s on a 129M file with 5M "entries" (26 bytes each) in 100,000 lines.
#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>
using namespace std;
int main ()
{
ifstream input("in.txt");
ofstream output("out.txt", ios::out);
const size_t size = 512 * 1024;
char buffer[size];
while (input) {
input.read(buffer, size);
size_t readBytes = input.gcount();
replace(buffer, buffer+readBytes, '|', '#');
output.write(buffer, readBytes);
}
input.close();
output.close();
return 0;
}
这篇关于如何提高我的C ++程序读取分隔的文本文件的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!