Git中的数据完整性? [英] Data integrity in Git?

查看:130
本文介绍了Git中的数据完整性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我已经多次听到这个提到,Git提供了数据完整性。但是,这意味着什么?



据我所知,使用SHA-1校验和可以访问git中的所有对象,并且该校验和是根据文件的内容计算出来的。这意味着如果文件发生变化,您将得到不同的校验和。



但是,这如何提供数据完整性?如果我根据校验和(key)查找一些数据,如果它没有找到(如果它以某种方式被破坏),git会返回一个错误。我假设数据在使用git-disk读取错误等时仍然可能被破坏。

真的没有什么区别。 SVN在这里或如何在Git中提供数据完整性。

解决方案


基于校验和(key),git会返回一个错误,如果它没有找到的话(如果它以某种方式被破坏)。

本质上,是。假设原始正确的数据校验和为1234.Git存储该校验和并通过该校验和查找数据。 (这就是它的内容寻址的工作原理:通常从一个像 master 这样的分支名称开始,它映射到一个提交ID,如 56789ab ... 。这个映射保存在git的refs中,它比其他数据更加脆弱,但我们暂时假设这个部分保持不变。)



然后Git通过ID提取提交内容,并将内容的校验和与ID进行比较。这必须匹配,或提交内容已损坏。假设内容是有效的,它们包含一个(单个)树ID(加上关于提交的信息:谁做它,何时,它的父母等等)。

然后,Git通过ID提取树内容,并将内容的校验和与ID进行比较。这必须匹配,或树内容已损坏。假设内容是有效的,它们包含一系列给出文件模式,名称和ID的元组。对于每一行,模式都会区分额外的树或纯文件(blob)。名称是子树或文件的名称,ID是内容的校验和。



然后,Git通过ID提取子树或blob内容并比较校验和。这必须匹配,或内容已损坏。假设内容是有效的,一棵子树像以前一样递归处理,并且一个文件是正确的(没有被泄露)。



请注意,在这种情况下,只是告诉你出了什么问题,但不能解决问题;为此,您需要备份(例如存储库的另一个副本)。如果失败发生在进程相当远的地方,很明显这是数据损坏的原因,因为校验和的有效时间足够长,以便在失败之前找到提交和树以及可能有几个子树。



如果引用已损坏,则很难重建。但是,git可以遍历数据库中的每个对象,并查看是否有未引用。这样的对象可以用来指示被破坏的引用 指向的位置。实际上,在实践中修复这个问题通常是毫无意义的:您只需转到在损坏的Blob中使用的相同备份。


I have now heard this mentioned a couple of times, that Git provides data integrity. But what does that mean?

I understand that all objects in git are accessed using a SHA-1 checksum and that this checksum is computed based on the content of the file. This means that if the file has changes you will get at different checksum.

But how does that provide data integrity? If I look up some data based on a checksum (key) will git return an error if its not found (if it has somehow become corrupted). I assume that data can still become corrupted when using git - disk read errors etc.

Don't really see the difference to e.g. SVN here or how data integrity is provided practically in Git.

解决方案

If I look up some data based on a checksum (key) will git return an error if its not found (if it has somehow become corrupted).

Essentially, yes. Suppose that the original correct data checksums to 1234. Git stores this checksum and looks up the data by that checksum. (This is how its "content addressable" thing works: one generally starts with, e.g., a branch name like master, which maps to a commit ID like 56789ab.... This mapping is kept in git's "refs", which are more vulnerable than the rest of the data, but let's assume for the moment that this part remains intact.)

Git then extracts the commit by ID, and compares the checksum of the contents to the ID. This must match, or the commit contents are corrupted. Assuming the contents are valid, they contain a (single) tree ID (plus information about the commit: who made it, when, its parents, and so on).

Git then extracts the tree contents by ID, and compares the checksum of the contents to the ID. This must match, or the tree contents are corrupted. Assuming the contents are valid, they contain a series of tuples giving file modes, names, and IDs. For each line, the mode distinguishes between additional trees or plain files ("blobs"). The name is the name of the sub-tree or file, and the ID is the checksum of the contents.

Git then extracts the sub-tree or blob contents by ID, and compares the checksum. This must match, or the contents are corrupted. Assuming the contents are valid, a sub-tree is handled recursively as before, and a file is correct (not compromised).

Note that along the way, any caught error simply tells you that something has gone wrong, but it does not correct the problem; for that, you need a backup (such as another copy of the repository). If the failure occurs fairly far along the process, it's clear that it's the data that are corrupt, since the checksums were valid long enough to find a commit and a tree and perhaps several sub-trees before the failure.

If the references are corrupted, they are hard to reconstruct. However, git can walk every object in the data-base and see if any are "unreferenced". Such objects are candidates for where the corrupted references should point. Actually fixing this, in practice, is usually pointlessly hard: you simply go to the same backup you would use in the case of a corrupted blob.

这篇关于Git中的数据完整性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆