md5 / sha1散列大文件 [英] md5/sha1 hashing large files
问题描述
我有超过1/2百万个文件在多个文件夹中散列
md5 / crc散列耗时过长一些文件大小为1GB〜11Gb
我只想将文件的一部分散列使用头
因此,当涉及到散列查找和散列所有内容时,下面的工作就是如此。 code> find。 -type f -exec sha1sum {} \;
我只是确定如何进一步去做这件事,并且首先对256kB文件例如
find。 -type f -exec head -c 256kB | sha1sum
不知道在这种情况下头可以使用dd会更好吗?
上面的命令不起作用,所以寻找我如何做到这一点的想法
我希望输出与所见相同一个本地md5sum,例如下面的格式(转到一个文本文件)
< Hash> <文件名>
我不确定以上是可能的,只需一行就可以,或者for / do循环需要被使用.....性能是在RHEL6上使用bash的关键
目前还不清楚你的限制在哪里。你有缓慢的磁盘还是慢的CPU?
如果你的磁盘不是限制,你可能受限于使用单个内核。 GNU Parallel可以提供帮助:
find。 -type f | parallel -X sha256sum
如果限制是磁盘I / O,那么您的想法<$ c $
sha(){
tail -c 1M $ 1| sha256sum | perl -pe'BEGIN {$ a = shift} s / - / $ a /'$ 1;
}
导出-f sha
查找。 -type f -print0 |并行-0 -j10 --tag sha
最佳值 -j10
取决于您的磁盘系统,所以请尝试调整它直到找到最佳值(可以低至 -j1
)。
I have over 1/2 million files to hash over multiple folders An md5/crc hashing is taking too long some files are 1GB ~ 11Gb in size Im thinking of just hashing part of the file using head
So the below works when it comes to hashing finding and hashing everything.
find . -type f -exec sha1sum {} \;
Im just sure how to take this a step further and just do hash for the first say 256kB of the file e.g
find . -type f -exec head -c 256kB | sha1sum
Not sure if head is okay to use in this instance of would dd be better? The above command doesn't work so looking for ideas on how I can do this
I would like the output to be the same as what is seen in a native md5sum e.g in the below format (going to a text file)
<Hash> <file name>
Im not sure if the above is possible with a single line or will a for/do loop need to be used..... Performance is key using bash on RHEL6
It is unclear where your limitation is. Do you have a slow disk or a slow CPU?
If your disk is not the limitation, you are probably limited by using a single core. GNU Parallel can help with that:
find . -type f | parallel -X sha256sum
If the limitation is disk I/O, then your idea of head
makes perfect sense:
sha() {
tail -c 1M "$1" | sha256sum | perl -pe 'BEGIN{$a=shift} s/-/$a/' "$1";
}
export -f sha
find . -type f -print0 | parallel -0 -j10 --tag sha
The optimal value of -j10
depends on your disk system, so try adjusting it until you find the optimal value (which can be as low as -j1
).
这篇关于md5 / sha1散列大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!