在许多文件上循环MD5计算器时的性能问题 [英] Performance issues when looping an MD5 calculator on many files
问题描述
我正在创建一个程序,通过将MD5与已经检查过的MD5的数据库进行比较来检查文件。
I'm creating a program that checks files by comparing their MD5s to a DB of already checked MD5s.
它遍历数千个文件,我看到了它使用了大量内存。
It loops through thousands of files, and I see that it uses a lot of memory.
如何使我的代码尽可能高效?
How can I make my code as efficient as possible?
for (File f : directory.listFiles()) {
String MD5;
//Check if the Imagefile instance is an image. If so, check if it's already in the pMap.
if (Utils.isImage(f)) {
MD5 = Utils.toMD5(f);
if (!SyncFolderMapImpl.MD5Map.containsKey(MD5)) {
System.out.println("Adding " + f.getName() + " to DB");
add(new PhotoDTO(f.getPath(), MD5, albumName));
}
}
这是至MD5:
public static String toMD5(File file) throws IOException, NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("MD5");
FileInputStream fis = new FileInputStream(file.getPath());
byte[] dataBytes = new byte[8192];
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
}
byte[] mdbytes = md.digest();
//convert the byte to hex format method 2
StringBuffer hexString = new StringBuffer();
for (int i = 0; i < mdbytes.length; i++) {
String hex = Integer.toHexString(0xff & mdbytes[i]);
if (hex.length() == 1) hexString.append('0');
hexString.append(hex);
}
return hexString.toString();
}
编辑:尝试使用FastMD5。相同的结果。
EDIT: Tried to use FastMD5. Same result.
public static String toMD5(File file) throws IOException, NoSuchAlgorithmException {
return MD5.asHex(MD5.getHash(file));
}
编辑2 尝试使用ThreadLocal和BufferedInputStream。我仍然有很多内存使用。
EDIT 2 Tried to use ThreadLocal and BufferedInputStream. I still have lots of memory usage.
private static ThreadLocal<MessageDigest> md = new ThreadLocal<MessageDigest>(){
protected MessageDigest initialValue() {
try {
return MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
System.out.println("Fail");
return null;
}
};
private static ThreadLocal<byte[]> dataBytes = new ThreadLocal<byte[]>(){
protected byte[] initialValue(){
return new byte[1024];
}
};
public static String toMD5(File file) throws IOException, NoSuchAlgorithmException {
// MessageDigest mds = md.get();
BufferedInputStream fis = new BufferedInputStream(new FileInputStream(file));
// byte[] dataBytes = new byte[1024];
int nread = 0;
while ((nread = fis.read(dataBytes.get())) != -1) {
md.get().update(dataBytes.get(), 0, nread);
}
byte[] mdbytes = md.get().digest();
//convert the byte to hex format method 2
StringBuffer hexString = new StringBuffer();
fis.close();
System.gc();
return javax.xml.bind.DatatypeConverter.printHexBinary(mdbytes).toLowerCase();
// return MD5.asHex(MD5.getHash(file));
}
推荐答案
感谢大家的帮助。
问题在于,通过的信息量如此之高,以至于GC无法正常工作。概念验证解决方案是在每200张照片后添加一个Thread.sleep(1000)。
一个完整的解决方案是使用GC更积极的方法,并一次计算批量的MD5。
Thanks for the help people. The problem was that the amount of info going through was so high and so large that the GC couldn't work correctly. The proof-of-concept solution was to add a Thread.sleep(1000) after each 200 photos. A full solution would be to use a more aggressive approach with the GC and to calculate the MD5 for bulks at a time.
这篇关于在许多文件上循环MD5计算器时的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!