算法确定文件的身份（优化） [英] Algorithm for determining a file’s identity (Optimisation)

查看：95 发布时间：2016/9/19 11:19:45 c# optimization identity

本文介绍了算法确定文件的身份（优化）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

回顾：我在寻找一种廉价的算法来确定一个文件，它的身份工作，绝大多数的时间。

Recap: I'm looking for a cheap algorithm for determining a files identity which works the vast majority of the time.

我继续实施一种算法，给了我一个的非常独特的每个文件的哈希值。

I went ahead and implemented an algorithm that gives me a "pretty unique" hash per file.

我的算法的工作方式是：

The way my algorithm works is:

有关超过一定的阈值较小的文件，我用的是满档内容标识哈希。

For files smaller than a certain threshold I use the full files content for the identity hash.

有关大于阈值的文件，我拍X尺寸的随机N个样本。

For files larger than the threshold I take random N samples of X size.

我包括散列数据文件大小。（意味着具有不同尺寸的所有文件产生不同的散列）

I include the filesize in the hashed data. (meaning all files with different sizes result in a different hash)

的问题：

我应该选择对于N和X什么样的价值观（多少个随机样本，我应该采取何种规模？）我的8K 4个样品去每一个，我不能难倒算法。我发现，增加样本量迅速减少了算法的速度（事业的目的是相当昂贵的）

What values should I choose for N and X (how many random samples should I take of which size?) I went with 4 samples of 8K each and am not able to stump the algorithm. I found that increasing the amount of samples quickly decreases the speed of the algorithm (cause seeks are pretty expensive)

在数学一：如何没有区别做我的文件需要这个算法炸毁。（与相同长度的2个不同的文件最终会产生相同的哈希）

The maths one: how non-different do my files need to be for this algorithm to blow up. (2 different files with same length end up having the same hash)

优化之一：有什么方法，我可以优化我的具体实施，以提高吞吐量（我似乎能够文件我的系统上第二个）。

The optimization one: Are there any ways I can optimize my concrete implementation to improve throughput (I seem to be able to do about 100 files a second on my system).

这是否实施看理智的做约100？你能想到的任何真实世界的例子，这将失败。（我关注的是媒体文件）

Does this implementation look sane? Can you think of any real world examples where this will fail. (My focus is on media files)

算法确定文件的身份（优化） [英] Algorithm for determining a file’s identity (Optimisation)

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

算法确定文件的身份（优化） [英] Algorithm for determining a file’s identity (Optimisation)

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭