如何有效地识别的二进制文件 [英] How to efficiently identify a binary file

查看：113 发布时间：2015/11/30 16:23:43 algorithm language-agnostic file performance identifier

本文介绍了如何有效地识别的二进制文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么是最有效的方法来识别二进制文件？我想提取某种签名的二进制文件，并用它来它与别人比较。

What's the most efficient way to identify a binary file? I would like to extract some kind of signature from a binary file and use it to compare it with others.

蛮力的方法是使用整个文件作为签名，这会花费太长的时间和太多的记忆。我在寻找一个更聪明的办法处理这一问题，我愿意牺牲一点精度（但不要太多，EY）性能。

The brute-force approach would be to use the whole file as a signature, which would take too long and too much memory. I'm looking for a smarter approach to this problem, and I'm willing to sacrifice a little accuracy (but not too much, ey) for performance.

（而Java code-例子是preferred，鼓励语言不可知的答案）

(while Java code-examples are preferred, language-agnostic answers are encouraged)

修改：扫描整个文件来创建一个散列具有较大的文件时，时间越长的缺点。因为散列不会是唯一的，无论如何，我在想，如果有一个更有效的方法（即：从一个字节均匀分布采样哈希）。

Edit: Scanning the whole file to create a hash has the disadvantage that the bigger the file, the longer it takes. Since the hash wouldn't be unique anyway, I was wondering if there was a more efficient approach (ie: a hash from an evenly distributed sampling of bytes).

推荐答案

我找到有效的这样的事情的一种方法是计算两个SHA-1散列。一个用于在一个文件中的第一个块（Ⅰ任意拾取512个字节作为一个块大小），一个用于整个文件。然后，我还有一个文件大小存储两个散列。当我需要找出一个文件还是先比较一下文件长度。如果长度匹配的话，我会比较的第一个块的哈希值，如果匹配我比较了整个文件的哈希值。前两个测试很快淘汰掉了很多不匹配的文件。

An approach I found effective for this sort of thing was to calculate two SHA-1 hashes. One for the first block in a file (I arbitrarily picked 512 bytes as a block size) and one for the whole file. I then stored the two hashes along with a file size. When I needed to identify a file I would first compare the file length. If the lengths matched then I would compare the hash of the first block and if that matched I compared the hash of the entire file. The first two tests quickly weeded out a lot of non-matching files.

这篇关于如何有效地识别的二进制文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何有效地识别的二进制文件 [英] How to efficiently identify a binary file

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何有效地识别的二进制文件 [英] How to efficiently identify a binary file

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭