将大量小文件读入内存的最快方法是什么? [英] What is the fastest way to read a large number of small files into memory?

查看:172
本文介绍了将大量小文件读入内存的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在每个服务器上读取~50个文件,并将每个文本文件的表示放入内存中。每个文本文件都有自己的字符串(这是用于字符串持有者的最佳类型?)。

I need to read ~50 files on every server start and place each text file's representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).

将文件读入内存的最快方法是什么,什么是保存文本的最佳数据结构/类型,以便我可以在内存中操作它(主要是搜索和替换)?

What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?

谢谢

推荐答案

内存映射文件最快......如下所示:

A memory mapped file will be fastest... something like this:

    final File             file;
    final FileChannel      channel;
    final MappedByteBuffer buffer;

    file    = new File(fileName);
    fin     = new FileInputStream(file);
    channel = fin.getChannel();
    buffer  = channel.map(MapMode.READ_ONLY, 0, file.length());

然后继续读取字节缓冲区。

and then proceed to read from the byte buffer.

这将明显快于 FileInputStream FileReader

编辑:

经过一番调查后发现,根据您的操作系统,您可能最好使用新的 BufferedInputStream(new FileInputStream(file))。然而,将所有内容一次性读入char []文件的大小听起来就像是最糟糕的方式。

After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new BufferedInputStream(new FileInputStream(file)) instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.

所以 BufferedInputStream 应该在所有平台上提供大致一致的性能,而内存映射文件可能很慢或很快,具体取决于底层操作系统。与性能至关重要的一切一样,您应该测试代码并查看最佳效果。

So BufferedInputStream should give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.

编辑:

好的,这里有一些测试(第一个是完成两次以将文件放入磁盘缓存)。

Ok here are some tests (the first one is done twice to get the files into the disk cache).

我在rt.jar类文件上运行它,提取到硬盘驱动器,这是在Windows 7 beta x64下。这是16784个文件,总共94,706,637个字节。

I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.

首先结果......

First the results...

(记得首先重复以获取磁盘缓存设置)

(remember the first is repeated to get the disk cache setup)


  • ArrayTest

  • ArrayTest


  • time = 83016

  • bytes = 118641472

ArrayTest

ArrayTest


  • 时间= 46570

  • bytes = 118641472

DataInputByteAtATime

DataInputByteAtATime


  • time = 74735

  • bytes = 118641472

DataInputReadFully

DataInputReadFully


  • time = 8953

  • bytes = 118641472

MemoryMapped

MemoryMapped


  • time = 2320

  • bytes = 118641472

以下是代码......

Here is the code...

import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;

public class Main
{
    public static void main(final String[] argv)
    {
        ArrayTest.main(argv);
        ArrayTest.main(argv);
        DataInputByteAtATime.main(argv);
        DataInputReadFully.main(argv);
        MemoryMapped.main(argv);
    }
}

abstract class Test
{
    public final void run(final File root)
    {
        final Set<File> files;
        final long      size;
        final long      start;
        final long      end;
        final long      total;

        files = new HashSet<File>();
        getFiles(root, files);

        start = System.currentTimeMillis();

        size = readFiles(files);

        end = System.currentTimeMillis();
        total = end - start;

        System.out.println(getClass().getName());
        System.out.println("time  = " + total);
        System.out.println("bytes = " + size);
    }

    private void getFiles(final File      dir,
                          final Set<File> files)
    {
        final File[] childeren;

        childeren = dir.listFiles();

        for(final File child : childeren)
        {
            if(child.isFile())
            {
                files.add(child);
            }
            else
            {
                getFiles(child, files);
            }
        }
    }

    private long readFiles(final Set<File> files)
    {
        long size;

        size = 0;

        for(final File file : files)
        {
            size += readFile(file);
        }

        return (size);
    }

    protected abstract long readFile(File file);
}

class ArrayTest
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new ArrayTest();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        InputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          soFar;
            int          sum;

            stream = new BufferedInputStream(new FileInputStream(file));
            data   = new byte[(int)file.length()];
            soFar  = 0;

            do
            {
                soFar += stream.read(data, soFar, data.length - soFar);
            }
            while(soFar != data.length);

            sum = 0;

            for(final byte b : data)
            {
                sum += b;
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputByteAtATime
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputByteAtATime();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final int fileSize;
            int       sum;

            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            sum      = 0;

            for(int i = 0; i < fileSize; i++)
            {
                sum += stream.readByte();
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputReadFully
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputReadFully();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          sum;

            stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            data   = new byte[(int)file.length()];
            stream.readFully(data);

            sum = 0;

            for(final byte b : data)
            {
                sum += b;
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputReadInChunks
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputReadInChunks();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          size;
            final int    fileSize;
            int          sum;

            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            data     = new byte[512];
            size     = 0;
            sum      = 0;

            do
            {
                size += stream.read(data);

                sum = 0;

                for(int i = 0; i < size; i++)
                {
                    sum += data[i];
                }
            }
            while(size != fileSize);

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}
class MemoryMapped
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new MemoryMapped();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        FileInputStream stream;

        stream = null;

        try
        {
            final FileChannel      channel;
            final MappedByteBuffer buffer;
            final int              fileSize;
            int                    sum;

            stream   = new FileInputStream(file);
            channel  = stream.getChannel();
            buffer   = channel.map(MapMode.READ_ONLY, 0, file.length());
            fileSize = (int)file.length();
            sum      = 0;

            for(int i = 0; i < fileSize; i++)
            {
                sum += buffer.get();
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

这篇关于将大量小文件读入内存的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆