为什么Java比C ++更快地读取大文件? [英] Why does Java read a big file faster than C++?

查看:266
本文介绍了为什么Java比C ++更快地读取大文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个2 GB的档案( iputfile.txt ),其中档案中的每一行都是一个字,如下:

  apple 
red
美丽
气味
spark
输入

我需要编写一个程序来读取文件中的每一个字并打印字数。我使用Java和C ++编写它,但结果令人惊讶:Java运行速度比C ++快2.3倍。我的代码如下:



C ++:

  ){
struct timespec ts,te;
double cost;
clock_gettime(CLOCK_REALTIME,& ts);

ifstream fin(inputfile.txt);
string word;
int count = 0;
while(fin>> word){
count ++;
}
cout<< count<< endl;

clock_gettime(CLOCK_REALTIME,& te);
cost = te.tv_sec - ts.tv_sec +(double)(te.tv_nsec-ts.tv_nsec)/ NANO;
printf(运行时间:%-15.10f s\\\
,cost);

return 0;
}

输出:

  5e + 08 
运行时间:69.311 s

Java:

  public static void main(String [] args)throws Exception {

long startTime = System.currentTimeMillis();

FileReader reader = new FileReader(inputfile.txt);
BufferedReader br = new BufferedReader(reader);
String str = null;
int count = 0;
while((str = br.readLine())!= null){
count ++;
}
System.out.println(count);

long endTime = System.currentTimeMillis();
System.out.println(运行时间:+(endTime - startTime)/ 1000 +s);
}

输出:

  5.0E8 
运行时间:29 s

为什么Java在这种情况下比C ++快,而且如何提高C ++的性能?

解决方案

一样的东西。 Java程序读取行,依赖换行符,而C ++程序读取分隔单词的空格,这是一些额外的工作。



尝试<$ c $


$ b

p>您还可以尝试并执行基本读取操作以读取字节数组,并扫描此换行符。


$ b

即使以后



在我的旧Linux笔记本上,jdk1.7.0_21和don't-tell-me-it's-old 4.3.3花了大约相同的时间,与C ++ getline相比。 (我们已经确定阅读的话是更慢的。)-O0和-O2之间没有太大的区别,这并不让我惊讶,考虑到循环中的代码的简单。



最后一个注意
正如我所建议的,使用LEN = 1MB的fin.read(buffer,LEN),并使用memchr扫描'\\\
'改进约20%,这使得C(目前没有任何C ++离开)比Java快。


I have a 2 GB file (iputfile.txt) in which every line in the file is a word, just like:

apple
red
beautiful
smell
spark
input

I need to write a program to read every word in the file and print the word count. I wrote it using Java and C++, but the result is surprising: Java runs 2.3 times faster than C++. My code are as follows:

C++:

int main() {
    struct timespec ts, te;
    double cost;
    clock_gettime(CLOCK_REALTIME, &ts);

    ifstream fin("inputfile.txt");
    string word;
    int count = 0;
    while(fin >> word) {
        count++;
    }
    cout << count << endl;

    clock_gettime(CLOCK_REALTIME, &te);
    cost = te.tv_sec - ts.tv_sec + (double)(te.tv_nsec-ts.tv_nsec)/NANO;
    printf("Run time: %-15.10f s\n", cost);

    return 0;
}

Output:

5e+08
Run time: 69.311 s

Java:

 public static void main(String[] args) throws Exception {

    long startTime = System.currentTimeMillis();

    FileReader reader = new FileReader("inputfile.txt");
    BufferedReader br = new BufferedReader(reader);
    String str = null;
    int count = 0;
    while((str = br.readLine()) != null) {
        count++;
    }
    System.out.println(count);

    long endTime = System.currentTimeMillis();
    System.out.println("Run time : " + (endTime - startTime)/1000 + "s");
}

Output:

5.0E8
Run time: 29 s

Why is Java faster than C++ in this situation, and how do I improve the performance of C++?

解决方案

You aren't comparing the same thing. The Java program reads lines, depening on the newline, while the C++ program reads white space delimited "words", which is a little extra work.

Try istream::getline.

Later

You might also try and do an elementary read operation to read a byte array and scan this for newlines.

Even later

On my old Linux notebook, jdk1.7.0_21 and don't-tell-me-it's-old 4.3.3 take about the same time, comparing with C++ getline. (We have established that reading words is slower.) There isn't much difference between -O0 and -O2, which doesn't surprise me, given the simplicity of the code in the loop.

Last note As I suggested, fin.read(buffer,LEN) with LEN = 1MB and using memchr to scan for '\n' results in another speed improvement of about 20%, which makes C (there isn't any C++ left by now) faster than Java.

这篇关于为什么Java比C ++更快地读取大文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆