传统的IO与内存映射 [英] Traditional IO vs memory-mapped

查看:176
本文介绍了传统的IO与内存映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图说明传统IO和内存映射文件在java中的性能差异。
我在互联网上找到了一个例子,但并不是所有的东西都是清楚的,我甚至不认为所有的步骤都是nececery。我在这里和那里读了很多,但我不相信他们两个都没有正确实现。



我试图理解的代码是:

  public class FileCopy {
public static void main(String args []){
if(args.length< ; 1){
System.out.println(Wrong usage!);
System.out.println(正确的用法是:java FileCopy<具有完整路径的大文件>);
System.exit(0);
}


字符串inFileName = args [0];
文件inFile =新文件(inFileName); (inFile.exists()!= true){
System.out.println(inFileName +does not exist!);

if
System.exit(0);


尝试{
new FileCopy()。memoryMappedCopy(inFileName,inFileName +。new);
new FileCopy()。customBufferedCopy(inFileName,inFileName +。new1);
} catch(FileNotFoundException fne){
fne.printStackTrace();
} catch(IOException ioe){
ioe.printStackTrace();
} catch(Exception e){
e.printStackTrace();




$ public void memoryMappedCopy(String fromFile,String toFile)throws Exception {
long timeIn = new Date() .getTime();
//读取输入文件
RandomAccessFile rafIn = new RandomAccessFile(fromFile,rw);
FileChannel fcIn = rafIn.getChannel();
ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE,0,(int)fcIn.size());
fcIn.read(byteBuffIn);
byteBuffIn.flip();

RandomAccessFile rafOut = new RandomAccessFile(toFile,rw);
FileChannel fcOut = rafOut.getChannel();

ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int)fcIn.size());

writeMap.put(byteBuffIn);

long timeOut = new Date()。getTime();
System.out.println(内存映射复制大小为+(int)fcIn.size()+的文件的时间为+(timeOut-timeIn));
fcOut.close();
fcIn.close();
}


static final int CHUNK_SIZE = 100000;
static final char [] inChars = new char [CHUNK_SIZE];

public static void customBufferedCopy(String fromFile,String toFile)throws IOException {
long timeIn = new Date()。getTime();

读者在=新的FileReader(fromFile);
Writer out = new FileWriter(toFile);
while(true){
synchronized(inChars){
int amountRead = in.read(inChars);
if(amountRead == -1){
break;
}
out.write(inChars,0,amountRead);
}
}
long timeOut = new Date()。getTime();
System.out.println(自定义缓冲副本大小文件的时间:+(int)new File(fromFile).length()+is+(timeOut-timeIn));
in.close();
out.close();






$ <$ c $当使用 RandomAccessFile的?这里是用来读写 memoryMappedCopy ,它实际上只是复制一个文件而已?或者它是记忆映射的一部分?
$ b

customBufferedCopy 中,为什么在这里使用 synchronized



我还发现了一个不同的例子:应该测试2:b
$ b

  public class MappedIO {
private static int numOfInts = 4000000;
private static int numOfUbuffInts = 200000;
私有抽象静态类Tester {
私有字符串名称;
public Tester(String name){this.name = name; }
public long runTest(){
System.out.print(name +:);
尝试{
long startTime = System.currentTimeMillis();
test();
long endTime = System.currentTimeMillis();
return(endTime - startTime);
catch(IOException e){
throw new RuntimeException(e);


public abstract void test()throws IOException;
}
private static Tester [] tests = {
new Tester(Stream Write){$ b $ public void test()throws IOException {
DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(new File(temp.tmp))));
for(int i = 0; i< numOfInts; i ++)
dos.writeInt(i);
dos.close();


新的Tester(Mapped Write){
public void test()throws IOException {
FileChannel fc =
new RandomAccessFile temp.tmp,rw)
.getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE,0,fc.size())
.asIntBuffer();
for(int i = 0; i< numOfInts; i ++)
ib.put(i);
fc.close();


new Tester(Stream Read){
public void test()throws IOException {
DataInputStream dis = new DataInputStream(
new BufferedInputStream(
new FileInputStream(temp.tmp)));
for(int i = 0; i< numOfInts; i ++)
dis.readInt();
dis.close();


新测试者(Mapped Read){
public void test()throws IOException {
FileChannel fc = new FileInputStream(
新文件(temp.tmp))。getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_ONLY,0,fc.size())
.asIntBuffer();
while(ib.hasRemaining())
ib.get();
fc.close();


新的Tester(Stream Read / Write){
public void test()throws IOException {
RandomAccessFile raf = new RandomAccessFile $ b new File(temp.tmp),rw);
raf.writeInt(1); (int i = 0; i< numOfUbuffInts; i ++){
raf.seek(raf.length() - 4);
raf.writeInt(raf.readInt());
}
raf.close();


新测试者(Mapped Read / Write){
public void test()throws IOException {
FileChannel fc = new RandomAccessFile(
new File(temp.tmp),rw)。getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE,0,fc.size())
.asIntBuffer();
ib.put(0);
for(int i = 1; i< numOfUbuffInts; i ++)
ib.put(ib.get(i-1));
fc.close();
}
}
};
public static void main(String [] args){
for(int i = 0; i< tests.length; i ++)
System.out.println(tests [i]。的runTest());






$ b

我或多或少地看到了什么事情,我的输出看起来像这样:

$ $ p $ $ $ $ $ $ $ $ $ $ $ $ b映射读取:40
流读/写:14481
映射读/写:6

什么使流读/写如此令人难以置信的长?作为一个读/写测试,对我来说,重复读取相同的整数看起来有点没有意义(如果我很好地理解 Stream Read / Write )从前面写的文件中读int并不是更好,只是在同一个地方读写int吗?有没有更好的方法来说明这一点?



我已经把这些事情打断了一段时间,而我却无法得到整个图片

解决方案

我看到一个基准测试Stream Read / Write:


  • 它并不真正流I / O,而是寻找文件中的特定位置。这是没有缓冲的,所以所有的I / O必须从磁盘上完成(其他流使用缓冲I / O,所以真正读/写大块,然后整数读或写到内存区) li>
  • 它正在寻找最后的4个字节,所以读取最后一个int并写入一个新的int。该文件在每次迭代中的长度增长一个int。这确实没有增加太多的时间成本,但是(但是表明这个基准的作者不是误解了某些东西就是不小心)。


    $ p
    $ b

    您问:


    从前面编写的文件中读取int的

    只读写同一个
    的地方是不是更好?


    这就是我认为笔者想用最后两个基准测试的结果,但这不是他们得到的。使用 RandomAccessFile 读取和写入文件中的相同位置,您需要在读取和写入之前进行查找:

      raf.seek(raf.length() -  4); 
    int val = raf.readInt();
    raf.seek(raf.length() - 4);
    raf.writeInt(val);

    这证明了内存映射I / O的一个优点,因为您可以使用相同的内存地址访问文件的相同位,而不必在每次调用之前进行额外的搜索。



    顺便说一句,您的第一个基准示例类也可能有问题,因为 CHUNK_SIZE 不是文件系统块大小的偶数倍。通常使用1024和8192的倍数是很好的,对于大多数应用程序来说,8192已经被证明是一个很好的选择(也是Java的 BufferedInputStream BufferedOutputStream 将该值用于默认缓冲区大小)。操作系统将需要读取额外的块来满足不在块边界上的读取请求。后续读取(一个流)将重新读取同一个块,可能是一些完整的块,然后再读一次。内存映射I / O总是物理读取和写入块,因为实际I / O由OS内存管理器处理,该内存管理器将使用其页面大小。页面大小总是优化,以良好地映射到文件块。

    在这个例子中,内存映射测试确实将所有内容读入内存缓冲区,然后将其全部写回去。这两个测试是不是写得比较这两种情况。 code> memmoryMappedCopy 应该读取和写入与 customBufferedCopy 相同的数据块大小。



    编辑:这些测试类甚至可能有更多的错误。由于你对另一个答案的评论,我再次仔细地看了第一堂课。

    方法 customBufferedCopy 是静态的,并使用静态缓冲区。对于这种测试,应在方法内定义缓冲区。那么它不需要使用 synchronized (虽然在这种情况下并不需要它,对于这些测试也是如此)。这个静态方法被称为一个正常的方法,这是不好的编程习惯(即使用 FileCopy.customBufferedCopy(...)而不是 new FileCopy ).customBufferedCopy(...))。

    如果你真的从多个线程运行这个测试,那么使用这个缓冲区将是有争议的而基准不仅仅是文件I / O,所以比较两种测试方法的结果是不公平的。


    I'm trying to illustrate the difference in performance between traditional IO and memory mapped files in java to students. I found an example somewhere on internet but not everything is clear to me, I don't even think all steps are nececery. I read a lot about it here and there but I'm not convinced about a correct implementation of neither of them.

    The code I try to understand is:

    public class FileCopy{
        public static void main(String args[]){
            if (args.length < 1){
                System.out.println(" Wrong usage!");
                System.out.println(" Correct usage is : java FileCopy <large file with full path>");
                System.exit(0);
            }
    
    
            String inFileName = args[0];
            File inFile = new File(inFileName);
    
            if (inFile.exists() != true){
                System.out.println(inFileName + " does not exist!");
                System.exit(0);
            }
    
            try{
                new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" );
                new FileCopy().customBufferedCopy(inFileName, inFileName+".new1");
            }catch(FileNotFoundException fne){
                fne.printStackTrace();
            }catch(IOException ioe){
                ioe.printStackTrace();
            }catch (Exception e){
                e.printStackTrace();
            }
    
    
        }
    
        public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{
            long timeIn = new Date().getTime();
            // read input file
            RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw");
            FileChannel fcIn = rafIn.getChannel();
            ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size());
            fcIn.read(byteBuffIn);
            byteBuffIn.flip();
    
            RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw");
            FileChannel fcOut = rafOut.getChannel();
    
            ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size());
    
            writeMap.put(byteBuffIn);   
    
            long timeOut = new Date().getTime();
            System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn));
            fcOut.close();
            fcIn.close();
        }
    
    
        static final int CHUNK_SIZE = 100000;
        static final char[] inChars = new char[CHUNK_SIZE];
    
        public static void customBufferedCopy(String fromFile, String toFile) throws IOException{
            long timeIn = new Date().getTime();
    
            Reader in = new FileReader(fromFile);
            Writer out = new FileWriter(toFile);
            while (true) {
                synchronized (inChars) {
                    int amountRead = in.read(inChars);
                    if (amountRead == -1) {
                        break;
                    }
                    out.write(inChars, 0, amountRead);
                }
            }
            long timeOut = new Date().getTime();
            System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn));
            in.close();
            out.close();
        }
    }
    

    When exactly is it nececary to use RandomAccessFile? Here it is used to read and write in the memoryMappedCopy, is it actually nececary just to copy a file at all? Or is it a part of memorry mapping?

    In customBufferedCopy, why is synchronized used here?

    I also found a different example that -should- test the performance between the 2:

    public class MappedIO {
        private static int numOfInts = 4000000;
        private static int numOfUbuffInts = 200000;
        private abstract static class Tester {
            private String name;
            public Tester(String name) { this.name = name; }
            public long runTest() {
                System.out.print(name + ": ");
                try {
                    long startTime = System.currentTimeMillis();
                    test();
                    long endTime = System.currentTimeMillis();
                    return (endTime - startTime);
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }
            public abstract void test() throws IOException;
        }
        private static Tester[] tests = { 
            new Tester("Stream Write") {
                public void test() throws IOException {
                    DataOutputStream dos = new DataOutputStream(
                            new BufferedOutputStream(
                                    new FileOutputStream(new File("temp.tmp"))));
                    for(int i = 0; i < numOfInts; i++)
                        dos.writeInt(i);
                    dos.close();
                }
            }, 
            new Tester("Mapped Write") {
                public void test() throws IOException {
                    FileChannel fc = 
                        new RandomAccessFile("temp.tmp", "rw")
                    .getChannel();
                    IntBuffer ib = fc.map(
                            FileChannel.MapMode.READ_WRITE, 0, fc.size())
                            .asIntBuffer();
                    for(int i = 0; i < numOfInts; i++)
                        ib.put(i);
                    fc.close();
                }
            }, 
            new Tester("Stream Read") {
                public void test() throws IOException {
                    DataInputStream dis = new DataInputStream(
                            new BufferedInputStream(
                                    new FileInputStream("temp.tmp")));
                    for(int i = 0; i < numOfInts; i++)
                        dis.readInt();
                    dis.close();
                }
            }, 
            new Tester("Mapped Read") {
                public void test() throws IOException {
                    FileChannel fc = new FileInputStream(
                            new File("temp.tmp")).getChannel();
                    IntBuffer ib = fc.map(
                            FileChannel.MapMode.READ_ONLY, 0, fc.size())
                            .asIntBuffer();
                    while(ib.hasRemaining())
                        ib.get();
                    fc.close();
                }
            }, 
            new Tester("Stream Read/Write") {
                public void test() throws IOException {
                    RandomAccessFile raf = new RandomAccessFile(
                            new File("temp.tmp"), "rw");
                    raf.writeInt(1);
                    for(int i = 0; i < numOfUbuffInts; i++) {
                        raf.seek(raf.length() - 4);
                        raf.writeInt(raf.readInt());
                    }
                    raf.close();
                }
            }, 
            new Tester("Mapped Read/Write") {
                public void test() throws IOException {
                    FileChannel fc = new RandomAccessFile(
                            new File("temp.tmp"), "rw").getChannel();
                    IntBuffer ib = fc.map(
                            FileChannel.MapMode.READ_WRITE, 0, fc.size())
                            .asIntBuffer();
                    ib.put(0);
                    for(int i = 1; i < numOfUbuffInts; i++)
                        ib.put(ib.get(i - 1));
                    fc.close();
                }
            }
        };
        public static void main(String[] args) {
            for(int i = 0; i < tests.length; i++)
                System.out.println(tests[i].runTest());
        }
    }
    

    I more or less see whats going on, my output looks like this:

    Stream Write: 653
    Mapped Write: 51
    Stream Read: 651
    Mapped Read: 40
    Stream Read/Write: 14481
    Mapped Read/Write: 6
    

    What is makeing the Stream Read/Write so unbelievably long? And as a read/write test, to me it looks a bit pointless to read the same integer over and over (if I understand well what's going on in the Stream Read/Write) Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place? Is there a better way to illustrate it?

    I've been breaking my head about a lot of these things for a while and I just can't get the whole picture..

    解决方案

    What I see with the one benchmark "Stream Read/Write" is:

    • It does not really do stream I/O but seeks to a specific location in the file. This is non-buffered so all the I/Os must be completed from disk (the other streams are using buffered I/O so really read/write in large blocks then the ints are read from or written to the memory area).
    • It is seeking to the end - 4 bytes so reads the last int and the writes a new int. The file continues to grow in length by one int every iteration. This really doesn't add much to the time cost though (but does show that the author of that benchmark either misunderstood something or was not careful).

    This explains the very high cost of that particular benchmark.

    You asked:

    Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place?

    This is what the author I think was trying to do with the last two benchmarks but that's not what they got. With RandomAccessFile to read and write the same place in the file you would need to put a seek before the read and the write:

    raf.seek(raf.length() - 4);
    int val = raf.readInt();
    raf.seek(raf.length() - 4);
    raf.writeInt(val);
    

    This does demonstrate one advantage of memory mapped I/O since you can just use the same memory address to access the same bits of the file instead of having to do an additional seek before every call.

    By the way, your first benchmark example class may have issues too since CHUNK_SIZE is not an even multiple of the file system block size. Often it's good to use multiples of 1024 and 8192 has been shown as a good sweet spot for most applications (and the reason the Java's BufferedInputStream and BufferedOutputStream use that value for the default buffer sizes). The OS will need to read an extra block(s) to satisfy read requests that are not on block boundaries. Subsequent reads (of a stream) will reread the same block, possibly some full blocks, and then an extra again. Memory mapped I/O always physically reads and writes in blocks as the actual I/Os are handled by the OS memory manager which would use its page size. Page size is always optimized to map well to file blocks.

    In that example, the memory mapped test does read everything into a memory buffer and then write it all back out. These two tests are really not well written to compare those two cases. memmoryMappedCopy should read and write in the same chunk size as customBufferedCopy.

    EDIT: There may even be more things wrong with these test classes. Because of your comment to the other answer I looked more carefully at the first class again.

    Method customBufferedCopy is static and uses a static buffer. For this kind of test that buffer should be defined within the method. Then it would not need to use synchronized (though it doesn't need it in this context and for these tests anyway). This static method is called as a normal method, which is bad programming practice (i.e. use FileCopy.customBufferedCopy(...) instead of new FileCopy().customBufferedCopy(...)).

    If you actually did run this test from multiple threads the use of that buffer would be contentious and the benchmark would not just be about file I/O so it would not be fair to compare the results of the two test methods.

    这篇关于传统的IO与内存映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆