Git 树对象和 git 术语 [英] Git tree object and git terminology

查看:33
本文介绍了Git 树对象和 git 术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在学习 git,但对术语感到很困惑.

我是否正确理解树对象"实际上类似于文件夹对象"?它保留内部事物(blob)和其他树(子文件夹)的信息.它保存有关我们正在进行的项目的实际数据"的信息.

同时,commits/versions 的结构有一个树状结构(有向无环图确实有合并,但这只是一个细节),这棵树中叶子的路径可以称为分支.然而,git 中的分支"实际上只是指向提交的指针.

我这样理解对吗?鉴于版本树结构"已经存在的树结构,它只是我还是树对象"是一个非常具有误导性的名称?即使你想使用树这个词,把它称为树节点对象"或其他东西会更有意义——因为 git 中的树对象似乎并不包含一整棵树,只是一些 blob 和一个指向其他对象的指针树木.出于类似的原因,名称分支似乎也具有误导性.

解决方案

除非面向用户的文档坚持使用 tree-ish 这个词(如果即使 is 一个词),术语 tree 是 Git 内部的,所以他们叫它什么无关紧要:treemarplot把手袋,或任何您喜欢的东西.

也就是说,Git 中的树对象只是四种对象类型之一.它包含一系列条目,每个条目包含三个项目:

  • a 模式:一个八进制数,以 ASCII 空格结尾,没有前导零,它描述了条目的类型并为常规文件提供了 x 位;
  • a name:以 ASCII NUL 结尾的字节序列(C 中为 '\0',Python 中为 b'\0');和
  • 原始哈希 ID:20 个未编码字节.1

树对象中的名称实际上只是一个名称组件.如果模式条目是 40000,则哈希 ID 必须是另一个树对象的哈希 ID.如果模式为 120000100644100755,则哈希 ID 必须是 blob 对象的哈希 ID.如果模式为 160000,则哈希 ID 应该是存储在其他 Git 存储库中的提交对象,即 gitlink.其他模式通常是不允许的,尽管 git fsck 允许 100664 因为这种模式出现在一些现有的(非常旧的)存储库中.

blob 或 (mode 120000) 符号链接的文件名是通过将导致 blob 的树对象的名称组件串在一起,附加斜杠,然后添加最后一个最终树对象中的组件.也就是说,如果某个提交的顶级树对象是 T0,并且 blob 或符号链接直接出现在 in T0,那么该条目给出了将保存 blob 或符号链接的文件的名称.

但是如果 T0 有一个条目 foo 模式为 40000 和哈希 T1,Git 会去继续读取树对象 T1.如果 that 有一个条目 bar 模式为 100xxx120000,则 blob 对象将是一个文件或符号链接,其名称是 foo/bar.因此文件的路径名是通过遍历树对象直到到达叶子来产生的.

对于 gitlink(模式为 160000 的树实体),构建的路径名给出 Git 将在 .gitmodules 中检查的子模块路径,如果我们必须克隆子模块,并且哈希 ID 是我们将 git checkout 作为另一个 Git 存储库中分离的 HEAD 的提交.对于所有其他实体,哈希 ID 应该是 this Git 存储库中某个对象的哈希 ID,否则树对象不正确或存储库不一致(或两者兼有).

作为使用 Git 的人,您不必关心任何这些:只需像往常一样将文件放入索引中,然后使用 git write-tree 编写所有内容.使用 git read-tree 通过提交中的哈希 ID 抓取一棵树,以填充该树的索引 2.使用 git showgit cat-file 使用哈希 ID(blob 哈希)或路径名(commit-hash:pathgit rev-parse 可以翻译,现在很长一段时间,git cat-file也可以处理).

<小时>

1这是一个错误,因为当 Git 在未来使用更长的哈希 ID 时,树对象可能必须存储截断的哈希,或者我们需要一个新的风格的树对象.请注意,Mercurial 的内部树数据结构留下了更多空间.Git 可能应该使用由另一个 NUL 终止的 ASCII 化十六进制摘要.但是这里有足够多的其他棘手问题需要解决,所以这个问题有点小.

2如果你设置了GIT_INDEX_FILEgit read-tree 会将树读入你提供的路径名的备用索引中.

I've been learning about git and I'm quite confused by the terminology.

Do I understand it properly that a "tree object" is really something like a "folder object"? It keeps information of things inside it (blobs) and other trees (sub-folders). It keeps information about the "actual data" of the project we are working on.

At the same time, the structure of commits/versions has a tree like structure (directed acyclic graph really, with merges, but that's just a detail), and paths to a leaf in this tree could be called branches. "Branches" in git however, are actually just pointers to commits though.

Do I understand this right? Is it just me or is "tree objects" a pretty misleading name, given the already existing tree structure of the "version tree structure" ? Even if you wanted to use the word tree, it would make more sense to call it "tree node object" or something - since a tree object in git doesn't seem to contain a whole tree, just some blobs and a pointer to other trees. The name branches also seems misleading, for similar reasons.

解决方案

Except for the user-facing documentation's insistence on using the word tree-ish (if that even is a word), the term tree is internal to Git, so it shouldn't matter what they call it: tree, or marplot, or gripsack, or whatever you like.

That said, a tree object, inside Git, is simply one of the four object types. What it contains is a series of entries, with each entry holding three items:

  • a mode: an octal number, terminated with ASCII space, with no leading zeros, that describes the type of the entry and gives the x bit for regular files;
  • a name: a byte-sequence terminated with an ASCII NUL ('\0' in C, b'\0' in Python); and
  • a raw hash ID: 20 unencoded bytes.1

The name in a tree object is really just a name component. If the mode entry is 40000, the hash ID must be that of another tree object. If the mode is 120000, 100644, or 100755, the hash ID must be that of a blob object. If the mode is 160000, the hash ID is expected to be a commit object as stored in some other Git repository, i.e., a gitlink. Other modes are generally not allowed, though git fsck allows 100664 as this mode appears in some existing (very old) repositories.

The file name of a blob or (mode 120000) symbolic link is constructed by stringing together the name components of the tree objects that led to the blob, with slashes appended, and then adding the last component in the final tree object. That is, if the top-level tree object for some commit is T0, and the blob or symlink appears directly in T0, then the entry gives the name of the file that will hold the blob or symlink.

But if T0 has an entry foo with mode 40000 and hash T1, Git will go on to read tree object T1. If that has an entry bar with mode 100xxx or 120000, the blob object will be a file or symlink whose name is foo/bar. Hence the file's path name is produced by traversing tree objects until reaching a leaf.

For a gitlink (tree entity with mode 160000), the constructed path name gives the submodule path that Git will check for in .gitmodules, if we must clone the submodule, and the hash ID is the commit we'll git checkout as a detached HEAD in that other Git repository. For all other entities, the hash ID should be that of an object in this Git repository, otherwise the tree object is incorrect or the repository is inconsistent (or both).

As someone using Git, you do not have to care about any of this: just put files in the index as usual, and use git write-tree to write everything. Use git read-tree to grab a tree by the hash ID in a commit, to fill the index2 from that tree. Use git show or git cat-file to obtain a single file's contents using either a hash ID (blob hash) or a path name (commit-hash:path, which git rev-parse can translate, and for a long time now, git cat-file can handle as well).


1This is kind of a mistake, because when Git goes to using longer hash IDs in the future, either the tree objects may have to store truncated hashes, or we'll need a new flavor of tree object. Note that Mercurial's internal tree data structures left more room. Git probably should have used an ASCII-ized hex digest terminated by another NUL. But there are enough other thorny issues here to be resolve that this one is kind of minor.

2If you set GIT_INDEX_FILE, git read-tree will read the tree into the alternate index whose path name you provided.

这篇关于Git 树对象和 git 术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆