FileChannel ByteBuffer 和哈希文件 [英] FileChannel ByteBuffer and Hashing Files

查看:18
本文介绍了FileChannel ByteBuffer 和哈希文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 java 中构建了一个文件散列方法,它采用 filepath+filename 的输入字符串表示,然后计算该文件的散列.散列可以是任何本机支持的 ​​Java 散列算法,例如 MD2SHA-512.

I built a file hashing method in java that takes input string representation of a filepath+filename and then calculates the hash of that file. The hash can be any of the native supported java hashing algo's such as MD2 through SHA-512.

我试图找出性能的最后下降,因为这种方法是我正在从事的项目的一个组成部分.有人建议我尝试使用 FileChannel 而不是常规的 FileInputStream.

I am trying to eek out every last drop of performance since this method is an integral part of a project I'm working on. I was advised to try using FileChannel instead of a regular FileInputStream.

我原来的方法:

    /**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2, MD5 <br/>
     *     SHA-1 <br/>
     *     SHA-256, SHA-384, SHA-512
     * @return String value of hash. (Variable length dependent on hash algorithm used)
     * @throws IOException If file is invalid.
     * @throws HashTypeException If no supported or valid hash algorithm was found.
     */
    public String getHash(String file, String hashAlgo) throws IOException, HashTypeException {
        StringBuffer hexString = null;
        try {
            MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo));
            FileInputStream fis = new FileInputStream(file);

            byte[] dataBytes = new byte[1024];

            int nread = 0;
            while ((nread = fis.read(dataBytes)) != -1) {
                md.update(dataBytes, 0, nread);
            }
            fis.close();
            byte[] mdbytes = md.digest();

            hexString = new StringBuffer();
            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException | HashTypeException e) {
            throw new HashTypeException("Unsuppored Hash Algorithm.", e);
        }
    }

重构方法:

    /**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2, MD5 <br/>
     *     SHA-1 <br/>
     *     SHA-256, SHA-384, SHA-512
     * @return String value of hash. (Variable length dependent on hash algorithm used)
     * @throws IOException If file is invalid.
     * @throws HashTypeException If no supported or valid hash algorithm was found.
     */
    public String getHash(String fileStr, String hashAlgo) throws IOException, HasherException {

        File file = new File(fileStr);

        MessageDigest md = null;
        FileInputStream fis = null;
        FileChannel fc = null;
        ByteBuffer bbf = null;
        StringBuilder hexString = null;

        try {
            md = MessageDigest.getInstance(hashAlgo);
            fis = new FileInputStream(file);
            fc = fis.getChannel();
            bbf = ByteBuffer.allocate(1024); // allocation in bytes

            int bytes;

            while ((bytes = fc.read(bbf)) != -1) {
                md.update(bbf.array(), 0, bytes);
            }

            fc.close();
            fis.close();

            byte[] mdbytes = md.digest();

            hexString = new StringBuilder();

            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException e) {
            throw new HasherException("Unsupported Hash Algorithm.", e);
        }
    }

两者都返回正确的哈希值,但是重构的方法似乎只在小文件上协作.当我传入一个大文件时,它完全窒息,我不知道为什么.我是 NIO 的新手,所以请指教.

Both return a correct hash, however the refactored method only seems to cooperate on small files. When i pass in a large file, it completely chokes out and I can't figure out why. I'm new to NIO so please advise.

忘记提及我正在通过它抛出 SHA-512 进行测试.

Forgot to mention I'm throwing SHA-512's through it for testing.

UPDATE: 用我现在的方法更新.

UPDATE: Updating with my now current method.

    /**
     * Gets Hash of file.
     * 
     * @param file String path + filename of file to get hash.
     * @param hashAlgo Hash algorithm to use. <br/>
     *     Supported algorithms are: <br/>
     *     MD2, MD5 <br/>
     *     SHA-1 <br/>
     *     SHA-256, SHA-384, SHA-512
     * @return String value of hash. (Variable length dependent on hash algorithm used)
     * @throws IOException If file is invalid.
     * @throws HashTypeException If no supported or valid hash algorithm was found.
     */
    public String getHash(String fileStr, String hashAlgo) throws IOException, HasherException {

        File file = new File(fileStr);

        MessageDigest md = null;
        FileInputStream fis = null;
        FileChannel fc = null;
        ByteBuffer bbf = null;
        StringBuilder hexString = null;

        try {
            md = MessageDigest.getInstance(hashAlgo);
            fis = new FileInputStream(file);
            fc = fis.getChannel();
            bbf = ByteBuffer.allocateDirect(8192); // allocation in bytes - 1024, 2048, 4096, 8192

            int b;

            b = fc.read(bbf);

            while ((b != -1) && (b != 0)) {
                bbf.flip();

                byte[] bytes = new byte[b];
                bbf.get(bytes);

                md.update(bytes, 0, b);

                bbf.clear();
                b = fc.read(bbf);
            }

            fis.close();

            byte[] mdbytes = md.digest();

            hexString = new StringBuilder();

            for (int i = 0; i < mdbytes.length; i++) {
                hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
            }

            return hexString.toString();

        } catch (NoSuchAlgorithmException e) {
            throw new HasherException("Unsupported Hash Algorithm.", e);
        }
    }

因此,我尝试使用原始示例和最新更新示例对 2.92GB 文件的 MD5 进行基准测试.当然,任何基准测试都是相对的,因为有操作系统和磁盘缓存以及其他魔法"会导致重复读取相同的文件……但这里有一些基准测试.我加载了每种方法,并在重新编译后将其关闭 5 次.基准测试取自最后一次(第 5 次)运行,因为这将是该算法的最热"运行,以及任何魔法"(无论如何在我的理论中).

So I attempted a benchmark hashing out the MD5 of a 2.92GB file using my original example and my latest update's example. Of course any benchmark is relative since there is OS and disk caching and other "magic" going on that will skew repeated reads of the same files... but here's a shot at some benchmarks. I loaded each method up and fired it off 5 times after compiling it fresh. The benchmark was taken from the last (5th) run as this would be the "hottest" run for that algorithm, and any "magic" (in my theory anyways).

Here's the benchmarks so far: 

    Original Method - 14.987909 (s) 
    Latest Method - 11.236802 (s)

这是将相同的 2.92GB 文件散列所需的时间 25.03% 减少.还不错.

That is a 25.03% decrease in time taken to hash the same 2.92GB file. Pretty good.

推荐答案

3 条建议:

1) 每次读取后清除缓冲区

1) clear buffer after each read

while (fc.read(bbf) != -1) {
    md.update(bbf.array(), 0, bytes);
    bbf.clear();
}

2) 不要同时关闭 fc 和 fis,这是多余的,关闭 fis 就足够了.FileInputStream.close API 说:

2) do not close both fc and fis, it's redundant, closing fis is enough. FileInputStream.close API says:

If this stream has an associated channel then the channel is closed as well.

3) 如果你想使用 FileChannel 提高性能

3) if you want performance improvement with FileChannel use

ByteBuffer.allocateDirect(1024); 

这篇关于FileChannel ByteBuffer 和哈希文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆