git在执行操作时会执行什么操作:git gc-git prune [英] What does git do when we do : git gc - git prune

查看:114
本文介绍了git在执行操作时会执行什么操作:git gc-git prune的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

启动时后台发生了什么,

What's going on in background when launching,

  • git gc
  • git prune
  • git gc
  • git prune

git gc 的输出:

Counting objects: 945490, done. 
Delta compression using up to 4 threads.   
Compressing objects: 100% (334718/334718), done. 
Writing objects: 100%   (945490/945490), done. 
Total 945490 (delta 483105), reused 944529 (delta 482309) 
Checking connectivity: 948048, done.

git prune 的输出:

Checking connectivity: 945490, done.

这两个选项有什么区别?

What is the difference between these two options?

谢谢

推荐答案

TL; DR

git prune仅删除松散,不可达,陈旧的对象(对象必须具有所有三个属性才能被修剪).无法访问的打包对象保留在其打包文件中.可触及的松散物品仍可触及并松动.无法访问但尚未过时的对象也保持不变. stale 的定义有些棘手(请参阅下面的详细信息).

TL;DR

git prune only removes loose, unreachable, stale objects (objects must have all three properties to get pruned). Unreachable packed objects remain in their pack files. Reachable loose objects remain reachable and loose. Objects that are unreachable, but are not yet stale, also remain untouched. The definition of stale is a little tricky (see details below).

git gc的作用还更多:它打包引用,打包有用的对象,使reflog条目到期,修剪松散的对象,修剪删除的工作树以及修剪/gc的旧git rerere数据.

git gc does more: it packs references, packs useful objects, expires reflog entries, prunes loose objects, prunes removed worktrees, and prunes / gc's old git rerere data.

我不确定您在上面的在后台"是什么意思( background 在shell中具有技术含义,此处的所有活动都在shell的前景中进行,但我怀疑您不是故意使用这些术语.

I'm not sure what you mean by "in the background" above (background has a technical meaning in shells and all of the activity here takes place in the shell's foreground but I suspect you did not mean these terms).

git gc的工作是协调整个收集活动,包括但不限于git prune.下面的列表是由前景gc运行而没有--auto的命令集(忽略它们的参数,在某种程度上取决于git gc参数)

What git gc does is to orchestrate a whole series of collection activities, including but not limited to git prune. The list below is the set of commands run by a foreground gc without --auto (omitting their arguments, which depend to some extent on git gc arguments):

  • git pack-refs:紧凑的引用(将.git/refs/heads/....git/refs/tags/...条目转换为.git/packed-refs中的条目,消除了单个文件)
  • git reflog expire:使旧的reflog条目过期
  • git repack:将松散的对象打包为打包对象格式
  • git prune:删除不需要的松散物体
  • git worktree prune:删除用户已删除的已添加工作树的工作树数据
  • git rerere gc:删除旧的rerere记录
  • git pack-refs: compact references (turn .git/refs/heads/... and .git/refs/tags/... entries into entries in .git/packed-refs, eliminating the individual files)
  • git reflog expire: expire old reflog entries
  • git repack: pack loose objects into packed object format
  • git prune: remove unwanted loose objects
  • git worktree prune: remove worktree data for added worktrees that the user has deleted
  • git rerere gc: remove old rerere records

还有一些单独的文件活动,git gc单独执行,但是以上是主要步骤.请注意,git prune发生在 之后(1)reflog过期并且(2)运行git repack:这是因为删除的过期reflog条目可能导致对象成为未引用对象,因此无法获取打包然后修剪,以使其完全消失.

There are a few more individual file activities git gc does on its own, but the above is the main sequence. Note that git prune happens after (1) expiring reflogs and (2) running git repack: this is because an expired reflog entry that is removed may cause an object to become unreferenced, and hence not get packed and then get pruned so that it is completely gone.

在进行更详细的介绍之前,最好在Git中定义对象是什么,以及松散包装.我们还需要了解对象可达的含义.

Before going into any more detail, it's a good idea to define what an object is, in Git, and what it means for an object to be loose or packed. We also need to understand what it means for an object to be reachable.

每个对象都有一个哈希ID(例如,您在git log中看到的那些丑陋的ID之一),即该对象的名称,用于检索. Git将所有对象存储在键-值数据库中,其中名称是键,而对象本身是值.因此,Git的对象就是Git存储文件和提交的方式,实际上,有四种对象类型: commit 对象保存实际的提交.一个 tree 对象包含成对的集合, 1 是易于理解的名称,例如READMEsubdir,以及另一个对象的哈希ID.如果树中的名称是文件名,则另一个对象是 blob 对象;如果树的名称是子目录的名称,则另一个对象是另一个树对象. Blob对象保存实际的文件内容(但请注意,文件的 name 在链接到Blob的树中!).最后一个对象类型是带注释的标签,用于带注释的标签,在这里并不是特别有趣.

Every object has a hash ID—one of those big ugly IDs you have seen in git log, for instance—that is that object's name, for retrieval purposes. Git stores all the objects in a key-value database where the name is the key, and the object itself is the value. Git's objects are therefore how Git stores files and commits, and in fact, there are four object types: A commit object holds an actual commit. A tree object holds sets of pairs,1 a human-readable name like README or subdir along with another object's hash ID. That other object is a blob object if the name in the tree is a file name, or it is another tree object if the name is that of a subdirectory. The blob objects hold the actual file contents (but note that the name of the file is in the tree linking to the blob!). The last object type is annotated tag, used for annotated tags, which are not especially interesting here.

一旦完成,就不能更改任何对象.这是因为对象名称(即哈希ID)是通过查看对象内容的每一位来计算的.将任何一位从零更改为一位,反之亦然,哈希ID随之更改:您现在有了一个 different 对象,并带有一个 different名称.这就是Git检查没有文件被弄乱的方式:如果文件内容被更改,则对象的哈希ID将会更改.对象ID存储在树条目中,如果更改了树对象,则树的ID也将更改.树的ID存储在提交中,并且如果树ID被更改,则提交的哈希值也将发生变化.因此,如果您知道提交的哈希为a234b67...,并且提交的内容仍为a234b67...哈希,则提交中的内容没有任何变化,并且树ID仍然有效.如果树仍以其自己的名称散列,则其内容仍然有效,因此Blob ID是正确的;因此只要blob内容散列为自己的名称,该blob也是正确的.

Once made, no object can ever be changed. This is because the object's name—it hash ID—is computed by looking at every single bit of the object's content. Change any one bit from a zero to a one or vice versa and the hash ID changes: you now have a different object, with a different name. This is how Git checks that no file has ever been messed-with: if the file contents were changed, the hash ID of the object would change. The object ID is stored in the tree entry, and if the tree object were changed, the tree's ID would change. The tree's ID is stored in the commit, and if the tree ID were changed, the commit's hash would change. So if you know that the commit's hash is a234b67... and the commit's content still hashes to a234b67..., nothing changed in the commit, and the tree ID is still valid. If the tree still hashes to its own name, its content is still valid, so the blob ID is correct; so as long as the blob content hashes to its own name, the blob is correct as well.

对象可以是 loose ,这意味着它们存储为文件.文件的名称只是哈希ID. 2 松散对象的内容是zlib定义的.或者,可以打包打包对象,这意味着许多对象存储在单个打包文件中.在这种情况下,内容不只是缩小了,它们首先是 经delta压缩 . Git挑选一个 base 对象(通常是某些Blob(文件)的最新版本),然后找到可以用一系列命令表示的其他对象:获取基本文件,在此删除一些文本偏移量,在其他偏移量处添加其他文本,依此类推.打包文件的实际格式是在此处记录 ,如果稍微掉头.请注意,与大多数版本控制系统不同,增量压缩发生在存储对象抽象的之下级别:Git存储整个快照,然后增量压缩稍后基础对象. Git仍然通过其哈希ID名称访问对象.仅仅是读取对象涉及到读取打包文件,查找对象及其基础的增量基数,以及即时重建完整的对象.

Objects can be loose, which means they are stored as files. The name of the file is just the hash ID.2 The contents of the loose object are zlib-deflated. Or, objects can be packed, which means many objects are stored in a single pack-file. In this case the contents are not just deflated, they're first delta-compressed. Git picks out a base object—often the latest version of some blob (file)—and then finds additional objects that can be represented as a series of commands: take the base file, remove some text at this offset, add other text at another offset, and so on. The actual format of pack files is documented here, if a bit lightly. Note that unlike most version control systems, the delta-compression occurs at a level below the stored-object abstraction: Git stores whole snapshots, then does delta-compression later, on the underlying objects. Git still accesses an object by its hash-ID name; it's just that reading that object involves reading the pack file, finding the object and its underlying delta bases, and reconstructing the complete object on the fly.

关于压缩包文件有一条通用规则,该规则规定,压缩包文件的任何增量压缩对象都必须在相同的压缩包文件中具有其所有基础.这意味着打包文件是自包含的:无需打开多个其他打包文件即可从具有该对象的打包中取出一个对象. (可以故意违反此特定规则,从而产生Git所谓的 thin pack ,但这些规则仅用于通过网络连接将对象发送到已经具有基础对象的另一个Git.其他Git必须先修复"或胖化"该瘦包以制作普通的打包文件,然后再将其丢给其余的Git.)

There's a general rule about pack files that states that any delta-compressed object within a pack file must have all its bases in the same pack file. This means that a pack file is self-contained: there's never a need to open multiple additional pack files to get an object out of a pack that has the object. (This particular rule can be deliberately violated, producing what Git calls a thin pack, but those are intended to be used only to send objects over a network connection to another Git that already has the base objects. The other Git must "fix" or "fatten" the thin pack to make a normal pack file, before leaving it behind for the rest of Git.)

对象的可及性有些棘手.让我们首先看一下提交可达性.

Object reachability is a little bit tricky. Let's look first at commit reachability.

请注意,当我们有一个提交对象时,该提交对象本身包含多个哈希ID.它具有一个树的哈希ID,该ID包含与该提交一起进行的快照.它还具有一个或多个父提交的哈希ID,除非该特定提交是 root 提交.根提交的定义是没有父母的提交,因此有点循环:一个提交有父母,除非它没有父母.不过,这一点已经足够清楚了:给定一些提交,我们可以将该提交绘制为图形中的一个节点,并且箭头从该节点中出来,每个父对象一个:

Note that when we have a commit object, that commit object itself contains several hash IDs. It has one hash ID for the tree that holds the snapshot that goes with that commit. It also has one or more hash IDs for parent commits, unless this particular commit is a root commit. A root commit is defined as a commit with no parents, so this is a bit circular: a commit has parents, unless it has no parents. It's clear enough though: given some commit, we can draw that commit as a node in a graph, with arrows coming out of the node, one per parent:

<--o
   |
   v

这些父箭头指向提交的一个或多个父对象.给定一系列单亲提交,我们得到一个简单的线性链:

These parent arrows point to the commit's parent or parents. Given a series of single-parent commits we get a simple linear chain:

... <--o  <--o  <--o ...

这些提交之一必须是链的 start :即 root 提交.其中之一必须是 end ,这就是 tip 提交.所有内部箭头都指向后(向左),因此我们可以画出没有箭头的箭头,因为知道根在左边,而尖端在右边:

One of these commits must be the start of the chain: that's the root commit. One of these must be the end, and that's the tip commit. All of the internal arrows point backwards (leftwards) so we can draw this without the arrow-heads, knowing that the root is at the left and the tip is at the right:

o--o--o--o--o

现在,我们可以添加分支名称,例如master.名称只是指向提示提交:

Now we can add a branch name like master. The name simply points to the tip commit:

o--o--o--o--o   <--master

内嵌在内的所有箭头都无法更改,因为任何对象中的任何内容都无法更改.但是,分支名称master中的箭头实际上只是某些提交的哈希ID,并且该可以进行更改.让我们用字母来表示提交哈希:

None of the arrows embedded within a commit can ever change, because nothing in any object can ever change. The arrow in the branch name master, however, is actually just the hash ID of some commit, and this can change. Let's use letters to represent the commit hashes:

A--B--C--D--E   <-- master

名称master现在仅存储提交E的提交哈希.如果我们向master添加新的提交,则可以通过写出其父级为E且其树为快照的提交来实现,从而为我们提供了一个全新的哈希,我们可以将其称为F.提交F点指向E.我们让Git将F的哈希ID写入master,现在我们有了:

the name master now just stores the commit hash of commit E. If we add a new commit to master, we do this by writing out a commit whose parent is E and whose tree is our snapshot, giving us an all-new hash, which we can call F. Commit F points back to E. We have Git write F's hash ID into master and now we have:

A--B--C--D--E--F   <-- master

我们添加了一个提交,并更改了一个名称,master.从名称master开始,所有先前的提交都是 reachable .我们读取F的哈希ID,并读取提交F.它的哈希ID为E,因此我们已达到提交E.我们读取E以获得D的哈希ID,从而达到D.重复直到读到A,发现它没有 父级,然后完成.

We added one commit and changed one name, master. All the previous commits are reachable by starting at the name master. We read out the hash ID of F and read commit F. This has the hash ID of E, so we have reached commit E. We read E to get the hash ID of D, and thus reach D. We repeat until we read A, find that it has no parent, and are done.

如果存在分支,则仅表示我们有另一个名称发现的提交,其父代也是名称master发现的提交之一:

If there are branches, that just means that we have commits found by another name whose parents are one of the commits also found by the name master:

A--B--C--D--E--F   <-- master
             \
              G--H   <-- develop

名称develop定位提交H; H找到G; G指的是E.因此,所有这些提交都是 reachable .

The name develop locates commit H; H finds G; and G refers back to E. So all of these commits are reachable.

与多于一个的父级(即 merge commits )进行提交-如果提交本身是可实现的,则使他们的所有父级都可访问.因此,一旦进行了合并提交,就可以(但不必删除)删除标识已合并的提交的分支名称:现在可以从进行合并操作时所在的分支的顶端访问该分支.那就是:

Commits with more than one parent—i.e., merge commits—make all their parents reachable if the commit itself is reachable. So once you make a merge commit, you can (but do not have to) delete the branch name that identifies the commit that was merged-in: it's now reachable from the tip of the branch that you were on when you did the merge operation. That is:

...--o--o---o   <-- name
      \    /
       o--o   <-- delete-able

通过合并,可以从name到达底部的提交,就像始终可以从name到达顶部的提交一样.删除名称delete-able使其仍然可以访问.如果合并提交不在此处,则在这种情况下:

the commits on the bottom row here are reachable from name, through the merge, just as the commits on the top row were always reachable from name. Deleting the name delete-able leaves them still reachable. If the merge commit is not there, as in this case:

...--o--o   <-- name2
      \
       o--o   <-- not-delete-able

然后删除not-delete-able有效地放弃底部一行的两个提交:它们变得不可访问,因此有资格进行垃圾收集.

then deleting not-delete-able effectively abandons the two commits along the bottom row: they become unreachable, and hence eligible for garbage-collection.

此相同的可达性属性适用于树和Blob对象.例如,Commit G中包含tree,而该tree中包含< name,ID>.对:

This same reachability property applies to tree and blob objects. Commit G has a tree in it, for instance, and this tree has <name, ID> pairs:

A--B--C--D--E--F   <-- master
             \
              G--H   <-- develop
              |
         tree=d097...
            /   \
 README=9fa3... Makefile=0b41...

因此,从提交G可以到达 tree 对象d097...;从该树中,可以到达 blob 对象9fa3...,blob对象0b41...也可以访问.提交H可能具有相同的README对象,并且名称相同(尽管树是不同的):很好,这使得9fa3可以双重访问,这对Git来说并不有趣:Git只关心它是完全可以达到的.

So from commit G, tree object d097... is reachable; from that tree, blob object 9fa3... is reachable, and so is blob object 0b41.... Commit H might have the very same README object, under the same name (though a different tree): that's fine, that just makes 9fa3 doubly reachable, which is not interesting to Git: Git only cares that it is reachable at all.

外部引用-分支和标签名称,以及在Git存储库中找到的其他引用(包括Git的 index 中的条目以及通过链接添加的工作树的任何引用),提供了对象图中的入口点.从这些入口点开始,任何对象都是可到达的-具有一个或多个可以导致该对象的名称-或不可到达,这意味着没有名称可以找到该对象本身.我从此描述中省略了带注释的标签,但是通常通过标签名称找到它们,并且带注释的标签对象具有一个找到的对象引用(任意对象类型),如果标签对象本身可访问,则使该对象可访问.

External references—branch and tag names, and other references found in Git repositories (including entries in Git's index and any references via linked added work-trees), provide the entry points into the object graph. From these entry points, any object is either reachable—has one or more names that can lead to it—or unreachable, meaning there are no names by which the object itself can be found. I've omitted annotated tags from this description, but they are generally found via tag names, and an annotated tag object has one object reference (of arbitrary object type) that it finds, making that one object reachable if the tag object itself is reachable.

因为引用仅引用一个对象,但是有时我们用分支名称做一些事,之后我们想撤消该分支,因此Git会将每个值的 log 保留为引用有,什么时候.这些参考日志或 reflogs 让我们知道昨天 中的内容,或者上周develop中的内容.最终,这些reflog条目过时且过时,不太可能再有用了,git reflog expire会丢弃它们.

Because references only refer to one object, but sometimes we do something with a branch name that we want to undo afterward, Git keeps a log of each value a reference had, and when. These reference logs or reflogs let us know what master had in it yesterday, or what was in develop last week. Eventually these reflog entries are old and stale and unlikely to be useful any more, and git reflog expire will discard them.

git repack的大致功能现在应该很清楚:它将许多松散对象的集合转换为包含所有这些对象的打包文件.但是,它可以做更多的事情:它可以包含先前包中的所有对象.前一个包变得多余,以后可以删除.它还可以从包中省略任何不可达对象,而将它们变成 loose 对象.当git gc运行git repack时,它会使用依赖于git gc选项的选项,因此此处的确切语义有所不同,但是前景git gc的默认设置是使用git repack -d -l,它的git repack删除冗余包并运行git prune-packed. prune-packed程序删除也出现在包文件中的松散对象文件,因此这将删除进入包中的松散对象. repack程序将-l选项传递给git pack-objects(这是构建打包文件的实际工作量),这意味着将省略从其他存储库借来的对象. (对于大多数正常的Git使用情况,最后一个选项并不重要.)

What git repack does, at a high level, should now be reasonably clear: it turns a collection of many loose objects into a pack file full of all those objects. It can do more, though: it can include all objects from a previous pack. The previous pack becomes superfluous and can be removed afterward. It can also omit any unreachable objects from the pack, turning them instead into loose objects. When git gc runs git repack it does so with options that depend on the git gc options, so the exact semantics vary here, but the default for a foreground git gc is to use git repack -d -l, which has git repack delete redundant packs and run git prune-packed. The prune-packed program removes loose object files that also appear in pack files, so this removes the loose objects that went into the pack. The repack program passes the -l option on to git pack-objects (which is the actual workhorse that builds the pack file) where it means to omit objects that are borrowed from other repositories. (This last option is not important for most normal Git usage.)

在任何情况下,都是git repack或技术上是git pack-objects来打印计数,压缩和写入消息.完成后,您将拥有一个新的打包文件,而旧的打包文件已消失.新的包文件包含所有可到达的对象,包括旧的可到达的打包对象和旧的可到达的松散对象.如果从一个旧的(现在已被拆除并删除)的打包文件中弹出了松散的对象,则它们会与其他松散(且无法访问)的打包文件合并在一起,从而使您的存储库变得混乱.如果它们在拆除过程中被破坏,则仅保留现有的松散和无法到达的物体.

In any case, it's git repack—or technically, git pack-objects—that prints the counting, compressing, and writing messages. When it is done you have a new pack file and the old pack file(s) are gone. The new pack file holds all the reachable objects, including the old reachable packed objects and the old reachable loose objects. If loose objects were ejected from one of the old (now torn-down and removed) pack files, they join the other loose (and unreachable) objects cluttering your repository. If they were destroyed during the tear-down, only the existing loose-and-unreachable objects remain.

现在是git prune的时候了:它查找松散,无法到达的对象并将其删除.但是,它有一个安全开关--expire 2.weeks.ago:默认情况下,由git gc运行,不会如果它们的存在时间不短于两周,则不会移除它们.这意味着,正在创建新对象的任何尚未挂接到它们的Git程序都具有宽限期.在默认情况下,git prune删除新对象之前十四天,它们可能会变得松散且无法访问.因此,一个忙于创建对象的Git程序有14天的时间可以完成将这些对象连接到图形中的工作.如果它确定这些对象不值得连接,则可以离开它们;否则,可以保留它们.从那时起14天后,将来的git prune将删除它们.

It's now time for git prune: this finds loose, unreachable objects and removes them. However, it has a safety switch, --expire 2.weeks.ago: by default, as run by git gc, it does not remove such objects if they are not at least two weeks old. This means that any Git program that is in the process of creating new objects, that has not yet hooked them up, has a grace period. The new objects can be loose and unreachable for (by default) fourteen days before git prune will delete them. So a Git program that is busy creating objects has fourteen days during which it can complete the hooking-up of those objects into the graph. If it decides those objects are not worth hooking-up, it can just leave them; 14 days from that point, a future git prune will remove them.

如果手动运行git prune,则必须选择--expire自变量.没有--expire的默认值不是2.weeks.ago,而是now.

If you run git prune manually, you must choose your --expire argument. The default without --expire is not 2.weeks.ago but instead just now.

1 树对象实际上包含三元组:名称,模式,哈希.对于blob对象,模式为100644100755,对于子树,模式为004000,对于符号链接,模式为120000,依此类推.

1Tree objects actually hold triples: name, mode, hash. The mode is 100644 or 100755 for a blob object, 004000 for a sub-tree, 120000 for a symbolic link, and so on.

2 对于Linux上的查找速度,哈希在前两个字符后进行拆分:哈希名称ab34ef56....git/objects目录中变为ab/34e567....这样可以将每个子目录的大小保持在.git/objects之内,从而减小某些目录操作的O(n 2 )行为.这与git gc --auto关联,当一个对象目录变得足够大时,它会自动重新打包. Git假定每个子目录的大小都与散列应该均匀分布的哈希大小大致相同,因此它只需要计算一个子目录.

2For lookup speed on Linux, the hash is split after the first two characters: the hash name ab34ef56... becomes ab/34e567... in the .git/objects directory. This keeps the size of each subdirectory within .git/objects small-ish, which tames O(n2) behavior of some directory operations. This ties in with git gc --auto which repacks automatically when one object directory becomes sufficiently large. Git assumes that each subdirectory is about the same size as the hashes should mostly be uniformly distributed, so it only needs to count one subdirectory.

这篇关于git在执行操作时会执行什么操作:git gc-git prune的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆