算法来确定一个文件的身份 [英] Algorithm for determining a file's identity

查看：146 发布时间：2015/11/30 16:35:32 algorithm filesystems virtualfilesystem

本文介绍了算法来确定一个文件的身份的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于一个开源项目，我有我的文件系统上面写了一个抽象层。

For an open source project I have I am writing an abstraction layer on top of the filesystem.

这层允许我连接的元数据和关系到每个文件。

This layer allows me to attach metadata and relationships to each file.

我想处理文件的图层命名已摆好，并保存元数据，如果一个文件被重命名/移动或复制。

I would like the layer to handle file renames gracefully and maintain the metadata if a file is renamed / moved or copied.

要做到这一点，我将需要一种机制，用于计算一个文件的标识。显而易见的解决方案是计算的SHA1哈希为每个文件，然后针对该散列分配的元数据。但是......那是真贵，尤其是对于电影。

To do this I will need a mechanism for calculating the identity of a file. The obvious solution is to calculate an SHA1 hash for each file and then assign metadata against that hash. But ... that is really expensive, especially for movies.

所以，我一直在想一个算法，尽管不是100％正确的将是正确的大多数时间，而且是廉价的。

So, I have been thinking of an algorithm that though not 100% correct will be right the vast majority of the time, and is cheap.

一个这样的算法可以是使用文件大小和字节的样本该文件以计算哈希值。

One such algorithm could be to use file size and a sample of bytes for that file to calculate the hash.

的字节我应该选择的样本？如何保持计算价格便宜，比较准确？我知道有一个权衡在这里，但性能是至关重要的。且用户将能够处理的情况下，系统会犯错误。

Which bytes should I choose for the sample? How do I keep the calculation cheap and reasonably accurate? I understand there is a tradeoff here, but performance is critical. And the user will be able to handle situations where the system makes mistakes.

我需要这个算法工作的非常大的文件（1GB +和小文件5K）

I need this algorithm to work for very large files (1GB+ and tiny files 5K)

修改

我需要这个算法工作在NTFS和所有SMB共享（Linux或Windows为基础），我想它来支持，其中一个文件从一个位置复制到另一个情况（2物理副本存在被视为一个身份）。我甚至可以考虑希望在此情况下，MP3文件被重新标记工作（物理文件被更改，所以我可以在一个文件类型的身份提供）。

I need this algorithm to work on NTFS and all SMB shares (linux or windows based), I would like it to support situations where a file is copied from one spot to another (2 physical copies exist are treated as one identity). I may even consider wanting this to work in situations where MP3s are re-tagged (the physical file is changed, so I may have an identity provider per filetype).

编辑2

相关问题：<一href="http://stackoverflow.com/questions/788761/algorithm-for-determining-a-files-identity-optimisation">Algorithm为确定文件的身份（优化）

算法来确定一个文件的身份 [英] Algorithm for determining a file's identity

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

算法来确定一个文件的身份 [英] Algorithm for determining a file&#39;s identity

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

算法来确定一个文件的身份 [英] Algorithm for determining a file's identity

登录关闭