如何有效地识别的二进制文件 [英] How to efficiently identify a binary file

查看:113
本文介绍了如何有效地识别的二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是最有效的方法来识别二进制文件?我想提取某种签名的二进制文件,并用它来它与别人比较。

What's the most efficient way to identify a binary file? I would like to extract some kind of signature from a binary file and use it to compare it with others.

蛮力的方法是使用整个文件作为签名,这会花费太长的时间和太多的记忆。我在寻找一个更聪明的办法处理这一问题,我愿意牺牲一点精度(但不要太多,EY)性能。

The brute-force approach would be to use the whole file as a signature, which would take too long and too much memory. I'm looking for a smarter approach to this problem, and I'm willing to sacrifice a little accuracy (but not too much, ey) for performance.

(而Java code-例子是preferred,鼓励语言不可知的答案)

(while Java code-examples are preferred, language-agnostic answers are encouraged)

修改:扫描整个文件来创建一个散列具有较大的文件时,时间越长的缺点。因为散列不会是唯一的,无论如何,我在想,如果有一个更有效的方法(即:从一个字节均匀分布采样哈希)。

Edit: Scanning the whole file to create a hash has the disadvantage that the bigger the file, the longer it takes. Since the hash wouldn't be unique anyway, I was wondering if there was a more efficient approach (ie: a hash from an evenly distributed sampling of bytes).

推荐答案

我找到有效的这样的事情的一种方法是计算两个SHA-1散列。一个用于在一个文件中的第一个块(Ⅰ任意拾取512个字节作为一个块大小),一个用于整个文件。然后,我还有一个文件大小存储两个散列。当我需要找出一个文件还是先比较一下文件长度。如果长度匹配的话,我会比较的第一个块的哈希值,如果匹配我比较了整个文件的哈希值。前两个测试很快淘汰掉了很多不匹配的文件。

An approach I found effective for this sort of thing was to calculate two SHA-1 hashes. One for the first block in a file (I arbitrarily picked 512 bytes as a block size) and one for the whole file. I then stored the two hashes along with a file size. When I needed to identify a file I would first compare the file length. If the lengths matched then I would compare the hash of the first block and if that matched I compared the hash of the entire file. The first two tests quickly weeded out a lot of non-matching files.

这篇关于如何有效地识别的二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆