解释Git仓库内部格式 [英] Git repository internal format explained

查看:139
本文介绍了解释Git仓库内部格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有关于Git如何在他的存储库中存储文件的文档?我试图通过互联网搜索,但没有可用的结果。也许我使用了不正确的查询,或许这是一个很好的秘密 - Git仓库内部格式?



让我解释一下,为什么我需要这个火箭科学信息: C#来获取文件历史表单信息库。但在 libgit2sharp 库中,它目前尚未实现。所以(作为一个负责人;)我需要自己实现这个功能,并为社区做出贡献。

但是在将内核源文件移动到github之后,我甚至没有知道从哪里开始搜索。



非常感谢您!

解决方案

存储库的内部格式非常简单。 Git本质上是一个内容可寻址的用户空间文件系统。

这是一张缩略图草图。

对象



Git将其内部数据结构存储为对象。有四种类型的对象:blob(有点像文件),树(有点像目录),提交(特定时间点的文件系统快照以及如何到达那里的信息)和标签(提交指针用于标记重要的)。

如果您查看存储库的 .git 目录,您会发现 objects 包含由SHA-1散列命名的文件的目录。他们每个人代表一个对象。您可以使用管道 git cat-file 命令检查它们。来自我的一个仓库的示例提交对象

  noufal @ sanitarium%git cat-file -p 7347addd901afc7d237a3e9c9512c9b0d05c6cf7 
tree c45d8922787a3f801c0253b1644ef6933d79fd4a
父母4ee56fbe52912d3b21b3577b4a82849045e9ff3f
作者Noufal Ibrahim< noufal @ ..> 1322165467 +0530
提交者Noufal Ibrahim< noufal @ ..> 1322165467 +0530

为自述文件添加了一个.md扩展名

您也可以在 .git / objects / 73 / 47addd901afc7d237a3e9c9512c9b0d05c6cf7



中查看对象本身。您可以检查其他类似的对象。每个提交指向一个表示当时文件系统的树,并且有一个(或者更多的情况下是合并提交)parent。

对象存储为单个文件在 objects 目录中。这些被称为松散物体。当运行 git gc 时,不能再被访问的对象被修剪,其余的被压缩到一个单独的文件中并被压缩。这是更高效的空间并压缩存储库。运行gc后,可以查看 .git / objects / pack / 目录以查看git packfiles。要解压它们,你可以使用管道命令 git unpack-objects 命令。 .git / objects / info / packs 文件包含当前存在的包文件列表。

参考文献



接下来你需要知道的是参考文献。这些是指向某些提交或对象的指针。你的分支和其他类似的东西是作为参考实现的。有两种真实(就像文件系统中的硬链接)和符号(它们是指向真实引用的指针 - 像符号链接)。

它们位于 .git / refs 目录中。例如,在上面的存储库中,我位于 master 分支。我最近的提交是:

  noufal @ sanitarium%git log -1 
commit 7347addd901afc7d237a3e9c9512c9b0d05c6cf7
作者:Noufal Ibrahim < noufal @ ...>
日期:星期五11月25日01:41:07 2011 +0530

为自述文件添加了一个.md扩展名

你可以看到位于 .git / refs / heads / master master c $ c>指向这个提交。

  noufal @ sanitarium%more .git / refs / heads / master 
7347addd901afc7d237a3e9c9512c9b0d05c6cf7

当前分支存储在位于<$ c $的符号引用 HEAD 中C>的.git / HEAD 。这里它是

  noufal @ sanitarium%更多.git / HEAD 
ref:refs / heads / master

如果您切换分支,它将会改变。

同样,标签也是这样的参考(但它们不像分支不可移动)。

整个存储库仅使用提交的DAG(每个提交指向表示某个时间点的文件的树)以及指向各种提交的引用在DAG上,以便你可以操纵它们。

进一步阅读




Is there any documentation on how Git stores files in his repository? I'm try to search over the Internet, but no usable results. Maybe I'm using incorrect query or maybe this is great secret — Git repository internal format?

Let me explain, why I need this rocket science information: I'm using C# to get file history form repository. But in libgit2sharp library it's not implemented currently. So (as a responsible person ;) I need to implement this feature by myself and contribute to community.

But after moving kernel sources to github I'm even don't know where start to my search.

Many thanks in advance!

解决方案

The internal format of the repository is extremely simple. Git is in essence a user space file system that's content addressable.

Here's a thumbnail sketch.

Objects

Git stores its internal data structures as objects. There are four kinds of objects: blobs (sort of like files), trees (sort of like directories), commits (snapshots of the file system at particular points in time along with information on how to reach there) and tags (pointers to commits useful for marking important ones).

If you look inside the .git directory of a repository, you'll find an objects directory that contains files named by the SHA-1 hash. Each of them represents an object. You can inspect them using plumbing git cat-file command. An example commit object from one of my repositories

noufal@sanitarium% git cat-file -p 7347addd901afc7d237a3e9c9512c9b0d05c6cf7
tree c45d8922787a3f801c0253b1644ef6933d79fd4a
parent 4ee56fbe52912d3b21b3577b4a82849045e9ff3f
author Noufal Ibrahim <noufal@..> 1322165467 +0530
committer Noufal Ibrahim <noufal@..> 1322165467 +0530

Added a .md extension to README

You can also see the the object itself at .git/objects/73/47addd901afc7d237a3e9c9512c9b0d05c6cf7.

You can examine other objects like this. Each commit points to a tree representing the file system at that point in time and has one (or more in case of merge commits) parent.

Objects are stored as single files in the objects directory. These are called loose objects. When you run git gc, objects that can no longer be reached are pruned and the remaining are packed together into a a single file and delta compressed. This is more space efficient and compacts the repository. After you run gc, you can look at the .git/objects/pack/ directory to see git packfiles. To unpack them, you can use the plumbing command git unpack-objects command. The .git/objects/info/packs file contains a list of packfiles that are currently present.

References

The next thing you need to know is what references are. These are pointers to certain commits or objects. Your branches and other such things are implemented as references. There are two kinds "real" (which are like hard links in a file system) and "symbolic" (which are pointers to real references - like symbolic links).

These are located in the .git/refs directory. For example, in the above repository, I'm on the master branch. My latest commit is

noufal@sanitarium% git log -1
commit 7347addd901afc7d237a3e9c9512c9b0d05c6cf7
Author: Noufal Ibrahim <noufal@...>
Date:   Fri Nov 25 01:41:07 2011 +0530

    Added a .md extension to README

You can see that my master reference located at .git/refs/heads/master points to this commit.

noufal@sanitarium% more .git/refs/heads/master
7347addd901afc7d237a3e9c9512c9b0d05c6cf7

The current branch is stored in the symbolic reference HEAD located at .git/HEAD. Here it is

noufal@sanitarium% more .git/HEAD
ref: refs/heads/master

It will change if you switch branches.

Similarly, tags are references like this too (but they are not movable unlike branches).

The entire repository is managed using just a DAG of commits (each of which points to a tree representing the files at a point in time) and references that point to various commits on the DAG so that you can manipulate them.

Further reading

  • I have a presentation which I use for my git trainings up here that explains some of this.
  • The community book at http://book.git-scm.com/ has some sections on the internals.
  • Scott Chacon's Pro Git book has a section on internals
  • He also has a peepcode PDF just about the internals.

这篇关于解释Git仓库内部格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆