git push和未引用的对象 [英] git push and unreferenced objects

查看:83
本文介绍了git push和未引用的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果没有运行 git prune git gc git push 会上传任何未引用的对象吗?想象一下这些提交历史:

A< = B< = C< = D< = E

在提交C中添加了一个新文件,并从提交D中删除了该文件.现在, git rebase --on B D 将导致:

A< = B< = E

,该文件仍位于.git/objects中,这是由两个分离的提交C和D引用的.现在在这两个事件中会发生什么:

  1. git push< remote>< branch> 现在将远程包含已删除的文件,因为文件对象仍然存在吗?

  2. 向远端发出请求的主上游请求.如果对1的回答是肯定的,如果C和D从未与上游合并,该文件是否会合并到上游?

此问题补充了此处讨论的情况从远程删除未引用的对象

解决方案

通常, git push 不会推送任何未引用的对象.

在某些情况下/优化可能会这样做,因为对此从未有任何明确的承诺.但是实际上,并非如此.

请注意,在重新设置基准之后, local 存储库具有一个新的(不同的哈希ID)提交 E':

  C--D--E [仅引用/ORIG_HEAD访问]/...-- A--B\E'<-somebranch(头) 

当您运行 git push< othergit>时,somebranch 到另一个Git,另一个Git向您的Git展示其分支提示提交哈希ID,而您的Git向他们展示提交 E'的哈希ID.他们显然还没有 E',因为您是自己制作的,所以他们说他们想要(或没有),而您的Git会显示 B 给他们;如果他们没有,则也将执行该提交,并在需要时也执行 A ,依此类推,直到历史回溯.

在某个时候,您的Git达到了他们要做的的某些提交,或用尽了提交的提交哈希ID.现在,您的两个Git同意要发送的内容,并且-通过这些协商-您的Git知道他们已经拥有哪些 commit ,并从中知道哪些 tree和blob对象他们也有(暗示他们有,例如,提交 A ,因此也包括所有较早的提交).

现在,

您的Git(通常为 1 )将准备所谓的瘦包.在这里您可以看到计数对象"和压缩对象"东西.精简包仅包含重构您要发送的提交所需的那些对象:例如,在我们的特定示例中,提交 E' B .其中包括它们没有的树和Blob对象-提交 A 的存在并没有隐含的含义,但不包括它们 具有的树和blob对象.

这就是使包装变得稀薄"的原因.pack:允许瘦包对丢失的对象进行增量压缩.假设commit A 具有一些由10 MB的blob对象表示的文件,commit B 和/或 E'具有一些文件不是100%相同,而是共享10兆字节对象的99%.瘦包的新对象可以进行增量压缩,即从对象_____ 中获取9.9 MB(用哈希ID填充空白),然后将剩余的100 kB 相加.常规包必须包含此基础对象",而瘦包则不需要.

接收方Git必须:

  • 拿来的瘦包
  • 检查传入的提交,并决定是否接受它们
  • 如果接受,则修复";薄包装或将对象转换为松散(未包装)的对象.

接收方Git现在具有用于新提交的所有必需对象,这些对象可以是松散的对象,也可以是新的固定的,不再细长的包装.假设是后者,这个不再薄的包将存储在该存储库中,因此,新对象(如果需要,可能还包括从其他包中检索到的一些对象)现在都存储在该存储库中,即现在的常规包中.>

(在某个时候重新包装包装变得有利可图.这部分变得相当复杂.)


1 这取决于您的Git与他们的Git之间进行通信的协议.另一种选择是一次上传每个对象,这对于通过网络发送的字节来说往往是非常浪费的,因此人们现在通常不使用旧协议.

Without running git prune or git gc, will git push upload any unreferenced objects? Imagine these commits history:

A <= B <= C <= D <= E

where in commit C a new file was added, and that file deleted from commit D. Now a git rebase --onto B D will result in:

A <= B <= E

and that file is still in .git/objects as it's referenced by the two detached commits C and D. Now what happens in these two events:

  1. git push <remote> <branch> will now remote contain the deleted because the file object still there?

  2. pull request to the main upstream that remote was forked from. If the answer to 1 is yes, will that file be merged to upstream if C and D were never merged with upstream?

edit: this question complements the case discussed here Removing unreferenced objects from remote

解决方案

In general, git push won't push any unreferenced objects.

There could be specific cases / optimizations where it might do so, because there's never been any explicit promise about this. But in practice, it doesn't.

Note that after your rebase, the local repository has a new (different hash ID) commit E':

          C--D--E   [reflog / ORIG_HEAD access only]
         /
...--A--B
         \
          E'  <-- somebranch (HEAD)

When you run git push <othergit> somebranch to some other Git, the other Git presents its branch tip commit hash IDs to your Git, and your Git presents the hash ID of commit E' to them. They obviously don't have E' yet since you just made it yourself, so they say they want it (or don't have it), and your Git presents B to them; if they don't have that, they'll take that commit as well, and A as well if needed, and so on backwards through history.

At some point, your Git reaches some commit that they do have, or runs out of commit hash IDs to send. Your two Gits now agree about what is to be sent, and—as a result of these negotations—your Git knows which commits they already have, and from that, which tree and blob objects they have as well (implied by them having, e.g., commit A and therefore all earlier commits as well).

Your Git now—usually1—prepares a so-called thin pack. This is where you see the "counting objects" and "compressing objects" stuff. The thin pack contains only those objects they will need to reconstruct the commits you're sending: in our particular example, commits E' and B, for instance. That includes tree and blob objects that they don't have—that aren't implied by the presence of commit A—but not tree and blob objects that they do have.

This is what makes the pack a "thin" pack: a thin pack is allowed to do delta-compression against missing objects. Let's say commit A has some file that is represented by a 10 megabyte blob object, and commit B and/or E' has some file that is not 100% identical, but shares 99% of that 10 megabyte object. The thin pack's new object can be delta-compressed, saying take 9.9 MB from object _____ (fill in the blank with a hash ID) and add these remaining 100 kB. A regular pack would have to include this "base object", but a thin pack doesn't.

The receiving Git must:

  • take the incoming thin pack
  • inspect the incoming commits, and decide whether to accept them
  • if they're accepted, "fix" the thin pack or convert the objects to loose (unpacked) objects.

The receiving Git now has all necessary objects for the new commits, either as loose objects or in a new fixed-up, no-longer-thin pack. Assuming the latter, this no-longer-thin pack is stored in that repository, so the new objects (plus perhaps some retrieved objects from other packs, if needed) are all in that repository now, in this now-regular pack.

(At some point it becomes profitable to repack the packs. This part gets quite complicated.)


1This depends on the protocol used to talk between your Git and their Git. The other option is to upload each object one at a time, which tends to be terribly wasteful in terms of bytes sent over the network, so people generally don't use the old protocols now.

这篇关于git push和未引用的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆