什么被“克隆"?和“推"在git clone和git push期间 [英] what gets "cloned" and "pushed" during git clone and git push

查看:66
本文介绍了什么被“克隆"?和“推"在git clone和git push期间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我运行

git push

git push origin master

我的仓库看起来像

      B--C--D <- master
     /
    A--E--F <- foo-branch

和起源看起来像

A <- master

推送是否包含提交E和F?我知道,通常它不包含 foo-branch ,但是所有提交仍会被推送吗?

does push include commits E and F? I understand that typcially it does not include foo-branch, but do all commits still get pushed?

同样,当我这样做

git clone <some-remote-repo>

我知道我通常会得到一个分支(似乎通常是 master ),但是即使没有得到指向它们的指针,我是否也有其他分支的提交的本地副本头?

I know I typically get one branch (seems to be usually master), but do I also have local copies of commits for for other branches, even if I don't get the pointers to their heads?

推荐答案

部分依赖于传输:git具有哑传输"(例如一次使用http传输一个对象)和智能传输"(使用 git:// ssh://协议,其中两个git相互协商,然后-假设接收方指示没问题-发送方构建一个瘦包".

It's partly transport-dependent: git has "dumb transports" (such as using http to transfer one object at a time) and "smart transports" (using the git:// or ssh:// protocols, where two gits negotiate with each other, then—provided that the receiver indicates that it's OK—the sender builds a "thin pack").

它也部分取决于命令:例如,如果您请求浅"克隆或单个分支,则获得的收益通常要少于普通"克隆.而且,当您运行 git push 时,您可以选择将特定的提交ID(如果有的话)最初传递到远程存储库,以及想要使用的分支名称.

It's also partly command-dependent: for instance, if you ask for a "shallow" clone, or a single branch, you generally get less than if you do a "normal" clone. And, when you run git push, you can choose which particular commit IDs, if any, you deliver originally to the remote repository, and what branch-name(s) you'd like them to use.

不过,让我们暂时忽略浅分支和单分支克隆.

Let's ignore the shallow and single-branch clones for now, though.

给出您的示例:

  B--C--D  <- master
 /
A--E--F    <- foo-branch

git push origin master (其refspec可能等效于 master:master ,即您尚未配置异常推送),其中您的远程origin 当前具有提交 A (对于 A 拥有什么分支标签,只有它具有 A ),并假设使用智能协议,则握手和传输协议的开始非常像这样:

and git push origin master (whose refspec is presumably equivalent to master:master, i.e., you have not configured an unusual push), where your remote origin currently has commit A (it doesn't matter what branch label(s) it has for A, only that it has A) and assuming a smart protocol, the handshake and transfer protocol starts out pretty much like this:

(your git) "what options do your support? I have thin-packs etc"
(their git) "I have thin-packs and ofs-delta and so on"
(your git) "ok, send me all your refs and their SHA-1s"
(their git) "refs/heads/master is <SHA-1 of A>"
(their git) "that's all I have"

这时,您的git知道将所有提交提交到远程需要什么 commits :如果您在存储库中运行了 git,则将列出这些提交修订列表主^ A (当然,请填写 A 的实际SHA-1).不需要排除其他SHA-1,因为远程 origin 仅具有一个分支,其尖端是commit A .

At this point, your git knows what commits are required to get all the commits to the remote: these are the commits that would be listed if you ran, in your repository, git rev-list master ^A (fill in the actual SHA-1 of A, of course). There is no need to exclude additional SHA-1s as the remote origin has nothing but the one branch, whose tip is commit A.

内部运行的方式是 git push 运行 git pack-objects (带有-thin ),然后运行 git rev-list ,将您要求推送的提交ID传递给它,并为所有提交排除(-not 或前缀 ^ )他们的git发送给您的ID(在我们的例子中,同样是一个commit-ID A ).请参阅 git rev-list 的文档,特别注意-objects-edge 选项(或-objects-edge-aggressive 当使用浅克隆时).

The way this works internally is that git push runs git pack-objects (with --thin), which then runs git rev-list, passing it the commit IDs you've asked to push, with exclusions (--not or prefix ^) for all the commit IDs their git sent you (again in our case that's just the one commit-ID A). See the documentation for git rev-list, paying particular attention to the --objects-edge option (or --objects-edge-aggressive when working with shallow clones).

您的 git rev-list 因此会输出提交 D 的ID,加上其树的ID以及该树的所有子树和Blob,除非 得出结论(通过否定ID,在本例中为排除了提交 A ^ A ),远程git必须已经具有它们.然后,它以相同的除非"条件输出提交 C 的ID及其树,依此类推.请注意,提交 A 具有与之关联的源树.并假定提交 C 具有相同树-例如,假定提交 C B 的还原.在这种情况下,不需要发送 C 的树:远程必须拥有它,因为远程具有提交 A .

Your git rev-list therefore outputs the ID of commit D, plus the IDs of its tree and all of that tree's subtrees and blobs, unless it concludes (via the negated IDs, in this case the ^A that excludes commit A) that the remote git must already have them. It then outputs the ID of commit C and its tree, with the same "unless" condition, and so on. Note that commit A has a source tree associated with it; and suppose commit C has the same tree—for instance, suppose commit C is a revert of B. In this case there's no need to send C's tree: the remote must have it because the remote has commit A.

(此对象查找可以通过位图进行优化.我认为有一个github博客文章描述了这些位图的开发,这是解决遍历大量提交图以便查找哪个提交图的较慢过程的一种解决方案.对象必须已经基于某些分支提示ID位于某个远程存储库中.这对它们有很大帮助,因为智能协议上的提取过程与推送过程是对称的:我们只需交换发送和接收角色.)

(This object-finding can be optimized via bitmaps. There's a github blog post, I think, describing the development of these bitmaps, which were a solution to the rather slow process of traversing lots of commit graphs so as to find which objects must already be in some remote repository based on some branch tip IDs. This helps them enormously because the fetch process across a smart protocol is symmetric with that of push: we simply swap send and receive roles.)

无论如何,您的 git rev-list 的输出将为您的 git pack-objects--thin 提供信息.这提供了所有要获取的对象ID(提交 D ,其树(如果需要)以及任何所需的子树和Blob;提交 C 和所需的对象;提交 B 和所需的对象),以及专门用于 not 的ID:提交 A 及其对象,以及是否在 A 之前提交,这些及其对象.pack-objects步骤构成一个增量压缩的pack,其中取这些对象"对象相对于不取这些其他对象"对象被压缩.

In any case, the output from your git rev-list feeds your git pack-objects --thin. This provides all the object IDs to take (commit D, its tree if needed, and any needed subtrees and blobs; commit C and needed objects; commit B and needed objects), and also IDs specifically not to take: commit A and its objects, and if there were commits before A, those and their objects. The pack-objects step makes a delta-compressed pack in which the "take these objects" objects are compressed against the "don't take these other objects" objects.

作为超级简化的示例,假设 A 的树包含一个10 MB的文件,其最后一行是"The end".假设 B 的树中有一个几乎相同的文件,只是删除了"The end"一词.Git可以将该文件压缩为指令以blob< id-of-file>开始,然后删除最后一行".这些说明的长度不足10 MB,并且在精简包"中允许使用.

As a super-simplified example, suppose that the tree for A includes a 10 MB file whose last line is "The end". Suppose that the tree for B has a file that's almost the same, except the words "The end" are removed. Git can compress this file into the instructions "start with blob <id-of-file>, then remove the last line." These instructions are much less than 10 MB long and are allowed in the "thin pack".

通过互联网电话连接(或任何连接两个git实例的数据线)发送的就是瘦包".然后,接收者将包加厚"为普通的git包(普通的包不允许针对不在包中的对象进行delta压缩).

It's this "thin pack" that is sent over the Internet-phone connection (or whatever datawire connects the two git instances). The receiver then "thickens" the pack into normal git packs (normal packs do not allow delta-compression against an object that is not already in the pack).

好的,那很长,但是归结为:您的git不会发送 F (因为您没有要求发送),也不会发送 E (因为您没有发送 F ),也不会查看连接到这两个提交的两棵树.但这确实取决于您使用的确切命令,以及您是否使用智能协议.

OK, that's quite long, but it boils down to: your git won't send F (because you didn't ask it to), nor E (because you're not sending F), nor will it look at the two trees attached to those two commits. But this does depend on the exact command you use, and whether you're using a smart protocol.

如果在没有-single-branch 的情况下运行 git clone ,则克隆操作将通过像往常一样调用遥控器并获取 all列表开始那个遥控器的引用(就像推!).要查看这些内容,请使用 git ls-remote :

If you run git clone without --single-branch, your clone operation starts by calling up the remote as usual, and getting a list of all that remote's references (just like push!). To see these, use git ls-remote:

From git://git.kernel.org/pub/scm/git/git.git
aa826b651ae3012d1039453b36ed6f1eab939ef9    HEAD
fdca2bed90a7991f2a3afc6a463e45acb03487ac    refs/heads/maint
aa826b651ae3012d1039453b36ed6f1eab939ef9    refs/heads/master
595b96af80404335de2a8c292cee81ed3da24d29    refs/heads/next
60feb01a0d7c7d54849c233d2824880c57ff9e94    refs/heads/pu
7af04ad560ab8edb07b498d442780a6a794162b0    refs/heads/todo
d5aef6e4d58cfe1549adef5b436f3ace984e8c86    refs/tags/gitgui-0.10.0
3d654be48f65545c4d3e35f5d3bbed5489820930    refs/tags/gitgui-0.10.0^{}

[再被抢断数百个]

[hundreds more snipped]

您的git然后从远程请求大约一切.(在这种情况下,大约"是不必要的,但是如果他们为您提供 heads/ tags/之外的 refs/没有得到这些.您还可以控制git带来了哪些标签.这里的细节有些混乱,但是在大多数普通存储库中,克隆会带来所有标签.)

Your git then requests just about everything from the remote. (In this case the "just about" is unnecessary, but if they present you with refs/ other than heads/ and tags/ you might not get those. You also get some control over what tags your git brings over. The details here are a bit messy, but in most normal repositories, a clone will bring over all the tags.)

当您这样说时,您是在错误的假设下绊倒:

You're tripping over a faulty assumption when you say this:

我知道我通常会获得一个分支(似乎通常是主分支),但是即使我没有得到指向它们头的指针,我是否也拥有其他分支的提交的本地副本?

I know I typically get one branch (seems to be usually master), but do I also have local copies of commits for for other branches, even if I don't get the pointers to their heads?

您的git请求并获得其所有分支.但是您的git也重命名.它们都被重命名以驻留在 refs/remotes/命名空间中,位于远程名称(通常为 origin ,但 -o< name> -起源< name> 进行更改).他们的 refs/heads/master 成为您的 refs/remotes/origin/master ;他们的 refs/heads/maint 成为您的 refs/remotes/origin/maint ;等等.

Your git asks for, and gets, all their branches. But your git renames them too. They're all renamed to live within the refs/remotes/ name-space, under the name of the remote (normally origin, but -o <name> or --origin <name> changes this). Their refs/heads/master becomes your refs/remotes/origin/master; their refs/heads/maint becomes your refs/remotes/origin/maint; and so on.

通过运行 git branch -r ,您将看到所有这些(略有缩写),它告诉 git branch 显示远程跟踪分支.(同样,远程跟踪分支"只是其全名以 refs/remotes/开头的那些分支.从特定远程站点获取的 git 会更新相应的远程站点-通过该远程仓库的配置条目中的 fetch = 指令跟踪分支.)

You will see all of these (abbreviated somewhat) by running git branch -r, which tells git branch to show remote-tracking branches. (And again, "remote-tracking branches" are just those branches whose full name starts with refs/remotes/. A git fetch from a particular remote updates the corresponding remote-tracking branches via the fetch = directives in the repo's configuration entry for that remote.)

如果您运行 git branch git status ,则看到的 master 实际上是在 clone中的最后一步.它实际上并没有运行 git checkout -它直接内置了相同的代码-但从本质上讲,您的克隆作为其最终操作将运行 git checkout branch-or-sha1 表示某个分支名称(或者,作为最后一次尝试,使用原始SHA-1给出分离的HEAD").使用的名称是:

The master that you see if you run git branch or git status is actually created as a last step in your clone. It doesn't actually run git checkout—it has the same code built in directly—but in essence, your clone, as its final operation, runs git checkout branch-or-sha1 for some branch name (or, as a last ditch attempt, a raw SHA-1 giving a "detached HEAD"). The name used is:

  • 您作为 git clone 的参数提供的那个,或
  • 远程git的 HEAD 指向的分支,如果您的分支可以解决这个问题,或者是在协议协商期间提供的. 1
  • the one you supplied as an argument to git clone, or
  • the branch that the remote git's HEAD points to, if your branch can figure this out, or if it was provided during protocol negotiation.1

如果这些操作失败-并假设您未指示 clone 进程 not 进行签出,则 git clone 进行以下操作它从遥控器获得的原始SHA-1作为遥控器的 HEAD .(在上面的示例 ls-remote 输出中,是 aa826b651ae3012d1039453b36ed6f1eab939ef9 .)

If those fail—and assuming you didn't instruct the clone process not to do a checkout—git clone does a checkout of the raw SHA-1 it got from the remote as the remote's HEAD. (In the example ls-remote output above this is aa826b651ae3012d1039453b36ed6f1eab939ef9.)

1 请注意, HEAD 作为原始SHA-1出现.长期以来,git中存在一个错误,如果此SHA-1至少与两个分支名称相对应,则 git clone 不知道要检查哪个分支出去.但是,由于智能协议是通过协商选项开始的,因此git伙计们能够添加一个选项,一个git通过该选项告诉另一个"HEAD指向分支X".因此,现在,即使导入的 HEAD 与多个导入的 refs/heads/* 名称匹配,git仍可以告诉您要使用哪个名称.

1Note that HEAD comes across as a raw SHA-1. For a long time, there was a bug in git where, if this SHA-1 corresponded to at least two branch names, git clone didn't know which branch to check out. Because smart protocols start by negotiating options, though, the git folks were able to add an option by which one git tells another "HEAD points to branch X". So now, even if the imported HEAD matches multiple imported refs/heads/* names, git can tell which one to use.

这篇关于什么被“克隆"?和“推"在git clone和git push期间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆