在Java中逐行读取和写入大型文件的最快方法 [英] Fastest Way To Read and Write Large Files Line By Line in Java

查看:291
本文介绍了在Java中逐行读取和写入大型文件的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找很多最快的方式来读取和写入有限的内存(大约64MB)在Java中的大文件(0.5 - 1 GB)。文件中的每一行代表一条记录,所以我需要逐行读取它们。该文件是一个正常的文本文件。

我尝试了BufferedReader和BufferedWriter,但它似乎不是最好的选择。读取和写入大小为0.5 GB的文件大约需要35秒,只能读取而不处理。我认为这里的瓶颈是写单独读取大约10秒。

我尝试读取字节数组,然后搜索每个读取的数组中的行更多的时间。



有什么建议吗?
谢谢

解决方案

我怀疑你真正的问题是你的硬件有限,你所做的是软件不会有很大的不同。如果你有足够的内存和CPU,更高级的技巧可以提供帮助,但是如果你只是在硬盘上等待,因为这个文件没有被缓存,那么它就不会有什么太大的区别。



BTW:在10秒内为500 MB或50 MB / sec为HDD的典型读取速度。

尝试运行以下内容您的系统无法高效地缓存文件。

  public static void main(String ... args)throws IOException {
for(int mb:new int [] {50,100,250,500,1000,2000})
testFileSize(mb);


private static void testFileSize(int mb)throws IOException {
File file = File.createTempFile(test,.txt);
file.deleteOnExit();
char [] chars = new char [1024];
Arrays.fill(chars,'A');
String longLine = new String(chars);
long start1 = System.nanoTime();
PrintWriter pw = new PrintWriter(new FileWriter(file));
for(int i = 0; i< mb * 1024; i ++)
pw.println(longLine);
pw.close();
long time1 = System.nanoTime() - start1;
System.out.printf(Took%.3f seconds to write to a%d MB,file rate:%.1f MB / s%n,
time1 / 1e9,file.length() > 20,file.length()* 1000.0 / time1);

long start2 = System.nanoTime();
BufferedReader br = new BufferedReader(new FileReader(file));
for(String line;(line = br.readLine())!= null;){
}
br.close();
long time2 = System.nanoTime() - start2;
System.out.printf(Took%.3f seconds to read to a%d MB file,rate:%.1f MB / s%n,
time2 / 1e9,file.length() > 20,file.length()* 1000.0 / time2);
file.delete();





$在一台有大量内存的Linux机器上。

 花费0.395秒写入50 MB文件速率:133.0 MB / s 
花费0.375秒读取50 MB文件,速率:140.0 MB / s
花了0.669秒写入100 MB文件率:156.9 MB / s
花了0.569秒读取到100 MB文件,比率:184.6 MB / s
花了1.585秒写入250 MB文件速率:165.5 MB / s
花费1.274秒读取到250 MB文件,速率:206.0 MB / s
花了2.513秒写入500 MB,文件率:208.8 MB / s
花费2.332秒读取一个500 MB文件,费率:225.1 MB / s
花了5.094秒写入一个1000 MB,文件率:206.0 MB / s
花费5.041秒读取到一个1000 MB的文件,费率:208.2 MB / s
花了11.509秒写入到2001 MB,文件率:182.4 MB / s
花了9.681秒读取到一个2001 MB文件,率:216.8 MB / s

的内存。

 花费0.376秒写入50 MB文件率:139.7 MB / s 
花费0.401秒读取50 MB文件,比率: 131.1 MB / s
花了0.517秒写入100 MB文件率:203.1 MB / s
花了0.520秒读取到100 MB文件,比率:201.9 MB / s
花了1.344秒写入一个250 MB,文件率:195.4 MB / s
花费了1.387秒读取到一个250 MB文件,率:189.4 MB / s
花了2.368秒写入500 MB,文件率:221.8 MB / s
花费2.454秒读取一个500 MB文件,费率:214.1 MB / s
花费4.985秒写入一个1001 MB,文件率:210.7 MB / s
花了5.132秒读取一个1001 MB的文件,费率:204.7 MB / s
花了10.276秒写入2003 MB,文件率:204.5 MB / s
花了9.964秒阅读到2003年的MB文件,费率:210.9 MB / S


I have been searching a lot for the fastest way to read and write again a large files (0.5 - 1 GB) in java with limited memory (about 64MB). Each line in the file represents a record, so I need to get them line by line. The file is a normal text file.

I tried BufferedReader and BufferedWriter but it doesn't seem to be the best option. It takes about 35 seconds to read and write a file of size 0.5 GB, only read write with no processing. I think the bottleneck here is writing as reading alone takes about 10 seconds.

I tried to read array of bytes, but then searching for lines in each array that was read takes more time.

Any suggestions please? Thanks

解决方案

I suspect your real problem is that you have limited hardware and what you do is software won't make much difference. If you have plenty of memory and CPU, more advanced tricks can help, but if you are just waiting on your hard drive because the file is not cached, it won't make much difference.

BTW: 500 MB in 10 secs or 50 MB/sec is a typical read speed for a HDD.

Try running the following to see at what point your system is unable to cache the file efficiently.

public static void main(String... args) throws IOException {
    for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
        testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
    File file = File.createTempFile("test", ".txt");
    file.deleteOnExit();
    char[] chars = new char[1024];
    Arrays.fill(chars, 'A');
    String longLine = new String(chars);
    long start1 = System.nanoTime();
    PrintWriter pw = new PrintWriter(new FileWriter(file));
    for (int i = 0; i < mb * 1024; i++)
        pw.println(longLine);
    pw.close();
    long time1 = System.nanoTime() - start1;
    System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
            time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

    long start2 = System.nanoTime();
    BufferedReader br = new BufferedReader(new FileReader(file));
    for (String line; (line = br.readLine()) != null; ) {
    }
    br.close();
    long time2 = System.nanoTime() - start2;
    System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
            time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
    file.delete();
}

On a Linux machine with lots of memory.

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s

On a windows machine with lots of memory.

Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s

这篇关于在Java中逐行读取和写入大型文件的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆