Java大文件磁盘IO性能 [英] Java Large Files Disk IO Performance

查看:360
本文介绍了Java大文件磁盘IO性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的硬盘上有两个(每个2GB的)文件,并希望将它们彼此进行比较:

I have two (2GB each) files on my harddisk and want to compare them with each other:


  • 复制原始文件Windows资源管理器约。

  • 使用阅读java.io.FileInputStream 两次,并比较每个字节每个字节的字节数组需要20分钟以上。

  • java.io.BufferedInputStream buffer is 64kb

  • 比较是一个紧缩循环,如

  • Copying the original files with Windows explorer takes approx. 2-4 minutes (that is reading and writing - on the same physical and logical disk).
  • Reading with java.io.FileInputStream twice and comparing the byte arrays on a byte per byte basis takes 20+ minutes.
  • java.io.BufferedInputStream buffer is 64kb, the files are read in chunks and then compared.
  • Comparison is done is a tight loop like

int numRead = Math.min(numRead[0], numRead[1]);
for (int k = 0; k < numRead; k++)
{
   if (buffer[1][k] != buffer[0][k])
   {
      return buffer[0][k] - buffer[1][k];
   }
}


我能做什么来加快速度? NIO是否应该比纯文件流更快? Java不能使用DMA / SATA技术,而是执行一些缓慢的OS-API调用?

What can I do to speed this up? Is NIO supposed to be faster then plain streams? Is Java unable to use DMA/SATA technologies and does some slow OS-API calls instead?

编辑:

感谢您的答案。我做了一些基于他们的实验。由于Andreas显示


Thanks for the answers. I did some experiments based on them. As Andreas showed


流或 nio 方法差异不大。
更重要的是正确的缓冲区大小。

streams or nio approaches do not differ much.
More important is the correct buffer size.

这是我自己的实验证实。由于文件是以大块读取的,即使额外的缓冲区( BufferedInputStream )也不会给出任何结果。优化比较是可能的,我得到了最好的结果与32倍展开,但比较的时间花费比磁盘读取小,所以加速很小。看起来没有什么我能做; - (

This is confirmed by my own experiments. As the files are read in big chunks, even additional buffers (BufferedInputStream) do not give anything. Optimising the comparison is possible and I got the best results with 32-fold unrolling, but the time spend in comparison is small compared to disk read, so the speedup is small. Looks like there is nothing I can do ;-(

推荐答案

我尝试了三种不同的方法比较两个相同的3,8 gb
第一种方法只使用两个缓冲输入流

I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

第二种方法使用一个线程池,读取两个缓冲输入流不同的线程并在第三个比较,这获得了略高的吞吐量,以高cpu利用率为代价。线程池的管理需要大量的开销与这些短时间运行的任务。

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.

第三种方法使用nio,如由laginimaineb

the third approach uses nio, as posted by laginimaineb

发布,你可以看到,一般的方法没有什么不同,更重要的是正确的缓冲区大小。

as you can see, the general approach does not differ much. more important is the correct buffer size.

很奇怪,我使用线程读取少一个字节,我不能发现错误的难处。

what is strange that i read 1 byte less using threads. i could not spot the error tough.

comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

使用的代码:

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    @Before
    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";
    }

    @Test
    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            compareWithBufferSize(currentBufferSize);
            currentBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            compareWithBufferSizeFutures(myBufferSize);
            myBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
                printAfterEquals(myBufferSize);
            } else {
                Assert.fail("files were not equal");
            }

            myBufferSize *= 2;
        }

    }

    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
                        myBufferSize);
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
                        myBufferSize);
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = inputStream1.read(buff1)) != -1) {
            totalbytes += read1;
            int read2 = inputStream2.read(buff2);
            if (read1 != read2) {
                break;
            }
            if (!Arrays.equals(buff1, buff2)) {
                break;
            }
        }
        if (read1 == -1) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
                        myBufferSize);
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),
                        myBufferSize);

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Even);
            }
        };
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Even);
            }
        };
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Odd);
            }
        };
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Odd);
            }
        };
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
            }
        };
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);
            }
        };

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
                }
            }
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
                }
            }
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                }
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                }
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);
            }
        }
    }

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            }
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead = first.read(firstBuffer);
                totalbytes += firstRead;
                secondRead = seconde.read(secondBuffer);
                if (firstRead != secondRead) {
                    return false;
                }
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
                }
            }
            return true;
        } finally {
            if (first != null) {
                first.close();
            }
            if (seconde != null) {
                seconde.close();
            }
        }
    }

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        }
        first.rewind();
        second.rewind();
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
            }
        }
        return true;
    }

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");
    }
}

这篇关于Java大文件磁盘IO性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆