获取文件哈希性能/优化 [英] Get File Hash Performance/Optimization

查看:222
本文介绍了获取文件哈希性能/优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图得到一个文件的哈希尽可能快。我有一个程序,散列大型数据集(100GB +)组成的随机文件大小的(任何地方从几KB高达5GB +每个文件)跨几个文件之间的任何地方可达几十万的文件。

I'm trying to get the hash of a file as fast as possible. I have a program that hashes large sets of data (100GB+) consisting of random file sizes (anywhere from a few KB up to 5GB+ per file) across anywhere between a handful of files up to several hundred thousand files.

该方案必须支持所有的Java支持的算法(MD2,MD5,SHA-1,SHA-256,SHA-384,SHA-512)。

The program must support all Java supported algorithms (MD2, MD5, SHA-1, SHA-256, SHA-384, SHA-512).

目前我使用的:

/**
 * Gets Hash of file.
 * 
 * @param file String path + filename of file to get hash.
 * @param hashAlgo Hash algorithm to use. <br/>
 *     Supported algorithms are: <br/>
 *     MD2, MD5 <br/>
 *     SHA-1 <br/>
 *     SHA-256, SHA-384, SHA-512
 * @return String value of hash. (Variable length dependent on hash algorithm used)
 * @throws IOException If file is invalid.
 * @throws HashTypeException If no supported or valid hash algorithm was found.
 */
public String getHash(String file, String hashAlgo) throws IOException, HashTypeException {
    StringBuffer hexString = null;
    try {
        MessageDigest md = MessageDigest.getInstance(validateHashType(hashAlgo));
        FileInputStream fis = new FileInputStream(file);

        byte[] dataBytes = new byte[1024];

        int nread = 0;
        while ((nread = fis.read(dataBytes)) != -1) {
            md.update(dataBytes, 0, nread);
        }
        fis.close();
        byte[] mdbytes = md.digest();

        hexString = new StringBuffer();
        for (int i = 0; i < mdbytes.length; i++) {
            hexString.append(Integer.toHexString((0xFF & mdbytes[i])));
        }

        return hexString.toString();

    } catch (NoSuchAlgorithmException | HashTypeException e) {
        throw new HashTypeException("Unsuppored Hash Algorithm.", e);
    }
}

有没有更优化的方式去获得一个文件的哈希?我在寻找极致的性能,我不知道我是否已经了解这一点的最好办法。

Is there a more optimized way to go about getting a files hash? I'm looking for extreme performance and am not sure if I have gone about this the best way.

推荐答案

我看到一些潜在的性能改进。一种是使用的StringBuilder 而不是的StringBuffer ;它的源代码兼容,但具有更好的性能,因为它是不同步的。第二(更重要的)是使用 FileChannel 的java.nio API,而不是的FileInputStream - 或者至少,包裹的FileInputStream 的BufferedInputStream 优化在I / O。

I see a number of potential performance improvements. One is to use StringBuilder instead of StringBuffer; it's source-compatible but more performant because it's unsynchronized. A second (much more important) would be to use FileChannel and the java.nio API instead of FileInputStream -- or at least, wrap the FileInputStream in a BufferedInputStream to optimize the I/O.

这篇关于获取文件哈希性能/优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆