为什么Java比C ++更快地读取大文件? [英] Why does Java read a big file faster than C++?
问题描述
我有一个2 GB的档案( iputfile.txt
),其中档案中的每一行都是一个字,如下:
apple
red
美丽
气味
spark
输入
我需要编写一个程序来读取文件中的每一个字并打印字数。我使用Java和C ++编写它,但结果令人惊讶:Java运行速度比C ++快2.3倍。我的代码如下:
C ++:
){
struct timespec ts,te;
double cost;
clock_gettime(CLOCK_REALTIME,& ts);
ifstream fin(inputfile.txt);
string word;
int count = 0;
while(fin>> word){
count ++;
}
cout<< count<< endl;
clock_gettime(CLOCK_REALTIME,& te);
cost = te.tv_sec - ts.tv_sec +(double)(te.tv_nsec-ts.tv_nsec)/ NANO;
printf(运行时间:%-15.10f s\\\
,cost);
return 0;
}
输出:
5e + 08
运行时间:69.311 s
Java:
public static void main(String [] args)throws Exception {
long startTime = System.currentTimeMillis();
FileReader reader = new FileReader(inputfile.txt);
BufferedReader br = new BufferedReader(reader);
String str = null;
int count = 0;
while((str = br.readLine())!= null){
count ++;
}
System.out.println(count);
long endTime = System.currentTimeMillis();
System.out.println(运行时间:+(endTime - startTime)/ 1000 +s);
}
输出:
5.0E8
运行时间:29 s
为什么Java在这种情况下比C ++快,而且如何提高C ++的性能?
一样的东西。 Java程序读取行,依赖换行符,而C ++程序读取分隔单词的空格,这是一些额外的工作。
尝试<$ c $
$ b p>您还可以尝试并执行基本读取操作以读取字节数组,并扫描此换行符。
$ b
即使以后
在我的旧Linux笔记本上,jdk1.7.0_21和don't-tell-me-it's-old 4.3.3花了大约相同的时间,与C ++ getline相比。 (我们已经确定阅读的话是更慢的。)-O0和-O2之间没有太大的区别,这并不让我惊讶,考虑到循环中的代码的简单。
最后一个注意
正如我所建议的,使用LEN = 1MB的fin.read(buffer,LEN),并使用memchr扫描'\\\
'改进约20%,这使得C(目前没有任何C ++离开)比Java快。
I have a 2 GB file (iputfile.txt
) in which every line in the file is a word, just like:
apple
red
beautiful
smell
spark
input
I need to write a program to read every word in the file and print the word count. I wrote it using Java and C++, but the result is surprising: Java runs 2.3 times faster than C++. My code are as follows:
C++:
int main() {
struct timespec ts, te;
double cost;
clock_gettime(CLOCK_REALTIME, &ts);
ifstream fin("inputfile.txt");
string word;
int count = 0;
while(fin >> word) {
count++;
}
cout << count << endl;
clock_gettime(CLOCK_REALTIME, &te);
cost = te.tv_sec - ts.tv_sec + (double)(te.tv_nsec-ts.tv_nsec)/NANO;
printf("Run time: %-15.10f s\n", cost);
return 0;
}
Output:
5e+08
Run time: 69.311 s
Java:
public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();
FileReader reader = new FileReader("inputfile.txt");
BufferedReader br = new BufferedReader(reader);
String str = null;
int count = 0;
while((str = br.readLine()) != null) {
count++;
}
System.out.println(count);
long endTime = System.currentTimeMillis();
System.out.println("Run time : " + (endTime - startTime)/1000 + "s");
}
Output:
5.0E8
Run time: 29 s
Why is Java faster than C++ in this situation, and how do I improve the performance of C++?
You aren't comparing the same thing. The Java program reads lines, depening on the newline, while the C++ program reads white space delimited "words", which is a little extra work.
Try istream::getline
.
Later
You might also try and do an elementary read operation to read a byte array and scan this for newlines.
Even later
On my old Linux notebook, jdk1.7.0_21 and don't-tell-me-it's-old 4.3.3 take about the same time, comparing with C++ getline. (We have established that reading words is slower.) There isn't much difference between -O0 and -O2, which doesn't surprise me, given the simplicity of the code in the loop.
Last note As I suggested, fin.read(buffer,LEN) with LEN = 1MB and using memchr to scan for '\n' results in another speed improvement of about 20%, which makes C (there isn't any C++ left by now) faster than Java.
这篇关于为什么Java比C ++更快地读取大文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!