git create分支等于fork父节点的master [英] Git create branch that is equal to master of fork parent

查看:64
本文介绍了git create分支等于fork父节点的master的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我的fork的分支等于fork父级的master分支.但是,我的fork上的master分支要先于父分支,在删除master分支上的更改之前,我想在分支上进行更改.我该怎么办?

解决方案

让我们在这里定义一个术语(尽管GitHub实际上是自己定义的):是您决定要创建GitHub分支时正在浏览的存储库.您点击了他们的叉子"按钮,现在您在您的帐户下拥有一个存储库,该存储库是 GitHub存储库的克隆.

由于GitHub分支(尽管其中一个具有附加功能)是一个克隆,所以我们现在有两个存储库.这两个存储库都是普通的Git存储库.他们只是驻留在GitHub上.如果愿意,我们可以将一个或两个存储库克隆到我们自己的笔记本电脑或其他计算机上.现在让我们暂缓执行此操作,以便现在只有两个 Git存储库可以处理.

在这一点上,已经推过叉子".按钮,这两个克隆是相同的(除了增加的功能,但这些位于Git固有的范围之外).Git存储库主要由两个数据库组成:

  • 有一个包含所有"Git对象"的数据库:主要是提交及其支持的对象,还有一些用于带注释的标签的东西.这些都是只读的:永远都不能更改.这些对象的内容,尤其是提交,是人类感兴趣的.但是,有一个问题:内容是通过其哈希ID 找到的,它们是用十六进制,对人类没有用.它们看起来是随机的(尽管它们根本不是随机的),并且它们 不可预测.

  • 与对象数据库分开,有一个名称的数据库:分支名称,标记名称和其他名称.这些名称是人类可读的,并且通常对人类有意义.名称数据库中的内容是提交的哈希ID(以及带注释的标记数据和带注释的标记数据).

一个常规克隆,就像我们在笔记本电脑上制作的那样,复制所有对象而没有一个分支.但是由GitHub fork 按钮创建的克隆会复制两者.因此,现在,我们的分支与分支具有相同的分支集.因此,此时,一切都相等.

但是,随着时间的推移,它们变得不平等,因为Git存储库通常将提交(和其他对象)添加到其提交和其他对象数据库中.当 our 分支(我们的GitHub存储库)发生这种情况时,我们的存储库会先于"他们的.当使用他们的分支发生这种情况时,他们的存储库会先于"我们的.当两者都发生时,它们会发散.这是完全正常和自然的.

您想以某种方式解决这种分歧.但是您有条件:

但是,我的fork上的master分支位于父级之前,因此我想在删除master分支上的更改之前先对分支进行更改.我该怎么办?

在这一点上,我们需要对分支的某种理解.说分支X在分支Y之前或者说分支X和Y已经分叉很好,但是那真正的含义是什么?一个Git信息库有两个数据库,也就是两个数据库中实际上发生了什么?

根据我们上面的说,我们已经知道部分答案了:一些新对象进入了对象数据库.对象很少(如果有的话)从对象数据库中被删除(如果这样做的话,它们都是自动的).Git是为添加提交而不是删除提交而构建的.

考虑到永不删除的承诺,只有添加,对人类真正重要的不是承诺本身—尽管对于Git本身,这就是 all 确实很关键-而是我们可以找到的提交内容以及操作方式.为此,我们需要"branch-and-other-names"数据库.

了解这一点的第一个关键是,每个 commit 都会记录一些早期提交的哈希ID.因此,从提交开始,我们可以倒退到较旧的提交.但是没有提交记录过任何 new 提交的哈希ID.这主要是因为我们一次构建一个东西就构建了这些东西.每个人都会获得一个新的,唯一的,外观随机且不可预测的哈希ID.我们不知道将来的哈希ID是什么:我们只知道过去的.因此,允许一个 child 提交(一个新的提交,它是由某个 parent (较旧的)提交制成的),允许知道其父级的哈希ID(已存在).但是这个孩子还不知道长大后会生什么孩子(如果有的话).而且,一旦孩子出生,就不能对其进行任何更改,因此就不能记录其孩子.

这给了我们一个向后看的链条:最近的孩子,无论是什么,都记得他的父母,还记得她的父母(最近的孩子的祖父母),依此类推:

  ...< -F< -G< -H 

其中每个大写字母代表一些看起来随机的哈希ID.但是,如果这是真的(也是事实),那么仍然存在一个问题:我们如何找到最近提交的哈希ID?

这是分支名称输入我们的图片的地方.分支名称仅记录一个哈希ID.一个哈希ID是该分支上的最近提交的ID .所以图片真的像这样:

  ...< -F< -G< -H<-main 

名称 main 包含 last 提交 H 的哈希ID.该提交包含所有文件的快照,以及其父提交 G 的哈希ID.提交 G 保存所有文件的快照,以及先前提交的 F 的哈希ID.每次提交都会重复此过程,直到我们回到有史以来的第一次提交为止.该提交称为 root commit -没有父级,因为它不能(有父级).这也是Git知道停止向后走的方式.

添加到分支,我们首先提取最新的提交:

  ...-- G--H<-main 

我们提取提交 H 的内容.我们为此做一些工作,然后重新提交.新的提交会获得一个新的唯一哈希ID(在整个宇宙中每个每个Git存储库中都是唯一的(这就是为什么哈希ID如此大而丑陋)的唯一原因)以及该提交及其支持的对象进入大型对象数据库:

  ...-- G--H<-main\一世 

Git确保在创建新提交 I 并由此获取其哈希ID之前,已将 H 的哈希ID存储在其中,以便将链接回 H .Git知道使用 H 的哈希ID,因为名称 main 仍指向 H .但是现在 I 在大数据库中了,Git做了最后一个小技巧:将 I 的哈希ID写入 names 数据库中,在分支名称主要"下入口.现在我们有了:

  ...-- G--H\我<-主要 

如果愿意,可以将其绘制为直线.

  ...-- G--H--I<-main 

在这一点上,两个克隆存在分歧:一个克隆的 GHI main 指向 I ,另一个克隆的> GH ,其 main 指向 H .分歧在于,一个严格地领先于另一个.哪个在前"?当然,我们添加了提交 I 的那个.另一个在后面".

如果我们,或者任何人,现在将一些新的提交添加到背后"存储库中,该新提交将获得一个新的,完全唯一的哈希ID.我们简称为 J .现在,我们将拥有:

  Repo 1:...-- G--H--I<-主要回购2:...-- G--H--J<-主要 

注意在每个存储库中,名称 main 如何选择 last 提交. last 提交在两个存储库中有所不同,并且两者都在另一个位置之前和之后.

如果要同步两个存储库,则会遇到很大的问题.假设我们希望将存储库2与存储库1同步,我们可以通过首先从存储库1抓取提交 I 并将其推入存储库2中来做到这一点.因为哈希ID的提交完全是唯一的,两个存储库中的 H same ,但是 I J 不同,所以现在Repo2具有:

 我???/...-- G--H--J<-主要 

我们已经在单独的行上绘制了提交 I 的事实并不重要(出于同样的原因,我们能够更早地移动它).但是名称 main 并不指向它,所以很重要,因为我们正是该 name 想要用来查找 I .如果我们使Repo 2的 main 指向 I ,我们将得到:

  I<-主要/...-- G--H--J ??? 

我们可以绘制为:

  ...-- G--H--I<-main\J ??? 

如果我们喜欢,但是无论我们如何绘制它,我们都无法再找到提交 J .

救援提交 J

如果我们创建一个新名称来指向提交 J ,该怎么办?让我们像以前一样开始Repo 2,不使用 I ,而是使用 J 作为对 main 的最后一次提交.实际上,让我们在回购2中放入一些不在回购1中的提交:

  ...-- G--H--J--K--L<-主 

让我们添加一个新名称,该名称*也指向现在提交 L :

  ...-- G--H--J--K--L<-主名 

附加名称分支名称,因此,与任何分支名称一样,它指向某些提交.我们选择 L 作为我们要指向的提交.我们为什么选择 L ?很明显,不是吗?这是 main 指向的名称.

现在,我们已经完成了此操作,让我们从存储库1中获取commit I ,然后更改现有的 main 使其指向提交 I ,就像他们的 主要:

  ...-- G--H--I<-main\J--K--L<-姓 

现在我们可以找到提交 L 了.提交 L 指向提交 K ,它指向提交 J ,指向提交 H .我们的所有提交都没有改变. did 唯一更改的是 name main ,它现在指向提交 I ,这是我们从中获得的他们的资料库.

完成所有这些操作的机制

使用GitHub执行此操作的问题在于GitHub具有不灵活,受限制的界面:Web界面.它的功能非常好-可以访问GitHub forks给您的所有添加的功能-但是在执行命令行Git所做的基本操作方面不是很好.因此,事实证明,执行此操作的方法是使用命令行Git.

一个问题是命令行Git稍有不同.我们有一种使用 git clone 复制存储库的方法.但是,此复制操作与GitHub的"fork a repository"(分叉存储库)不同克隆.我们要做的是运行:

  git clone ssh://git@github.com/user/repo.git 

在我们的笔记本电脑或任何类型的计算机上.(我在这里使用了 ssh:// URL;您可以根据需要使用 https://,尽管GitHub已开始推动人们使用ssh.)这个:

  • 建立一个新的空存储库;
  • 使用 git remote add 在短名称下添加URL,我们以后可以通过该短名称引用GitHub存储库:默认短名称为 origin ;
  • 复制此存储库中存在的所有 commits ,但不复制分支名称;然后
  • 执行 git checkout ,它在我们的新本地存储库中创建 一个分支名称.

(然后,我们必须 cd repo 或以其他方式输入新存储库,因为 git clone 无法创建我们的命令行,口译员为我们做到这一点.因此,我们也会这样做.)

这里有趣的是,我们的Git 不会复制其分支名称.这是因为我们的Git希望让我们组成自己的 分支名称,这些分支名称根本不需要匹配.我们可能希望至少有一些分支名称至少与某些分支名称匹配,但是我们的Git选择不对我们强加此名称-即使这是有道理的.我们的Git已调整为高级用户".即使我们是Git初学者,也从一开始就处于这种模式. 1 我们的Git对他们的Git分支名称的作用是将它们更改远程跟踪名称. 2

这些远程跟踪名称是通过将其分支名称并在其前面推入 origin/形成的.从技术上讲,我们在 git remote add 步骤中使用用于 remote 的任何名称-实际的全名是 refs/remotes/ remote/名称 .而且,从技术上讲,您自己的(本地)分支名称在 them 前面都有 refs/heads/.但是这些前缀中的某些或大多数通常不会显示:

  • 如果运行 git branch ,则会看到(本地)分支名称缩写为 main master 之类的东西.
  • 如果您运行 git branch -r ,则会看到您的远程跟踪名称简称为 origin/main .
  • 但是由于某些原因, 3 如果运行 git branch -a ,则远程跟踪名称将缩短为 remotes/origin/main 代替.

但是,您可以运行 git for-each-ref ,它会找到所有个参考.分支名称和远程跟踪名称只是两种引用形式.标记名是第三种形式,而Git有很多其他形式. for-each-ref 命令并不是真正面向普通用户的,它列出了他们的 all ,默认情况下打印出他们的全名,以及对象-,哈希ID和类型-它们指向.


1 这几乎可以肯定是一个非常糟糕的主意.不幸的是,现在更改它为时已晚:Git具有不要破坏现有用户的工作流程"的功能.哲学.

2 Git称为这些远程跟踪分支名称.这里的 branch 这个词是多余的,如果您使用此短语,您将很容易将其缩短为 remote-tracking分支,这是可以的,但是您会进一步尝试缩短到远程分支,这......还不行:这很令人困惑.您的意思是分支名称,但此处是,如在远程服务器上看到的;还是您的意思是名称,如本地Git存储库中的 remotes组中所示/origin/?

3 我不知道为什么.如果这也显示了标签,那可能是有道理的.但这不是.


力学,第2部分

现在我们有一个 our 分支的克隆,我们需要添加到该克隆中,所有 they 提交都没有.这意味着我们需要指导Git如何调用GitHub并从分支中读取内容.

为此,我们需要第二个遥控器.第二个遥控器需要一个名称.有一个标准的第二名-尽管我自己不太喜欢它-上游.您可以使用其中一种,也可以发明自己喜欢的一种.对于此答案,我将在此处使用上游:

  git remote add上游< url> 

对于URL,输入 ssh://git@github.com/them/their-repo.git https://github.com/them/their-repo.git 或将用于从GitHub上的 Git存储库中读取的任何URL:使用 FORK 按钮时使用的URL,可能已打开进入ssh URL.

(您可以根据需要运行:

  git ls-remote< url> 

首先查看您的URL是否正确.这使您的Git调用该URL并获取有关其分支,标签和其他名称的信息.就像运行 git for-each-ref 一样,除了它使用Git协议从其他Git读取数据.)

设置完成后,运行:

  git上游获取 

让您的Git调用其GitHub存储库.这从与 git ls-remote 相同的东西开始—实际上,您现在可以在上游运行 git ls-remote 了,但是接下来要做的更多:它从他们那里得到,他们所拥有的任何提交,所没有的任何提交,并将它们添加到本地,笔记本电脑或其他任何地方的存储库中.然后,这就是我之前提到的Power Git User事,它在您的Git存储库中为每个分支创建了远程跟踪名称.

因为您将此名称命名为 upstream ,所以这些远程跟踪名称的格式为 upstream/main upstream/develop ,依此类推.每个代码前面的 upstream/-或较长格式的 remotes/upstream ,或 refs/remotes/upstream 以使用真实全名—防止这些远程跟踪名称干扰您自己的任何分支名称. 4


4 请注意,如果您拥有自己的名为 upstream upstream/main 的分支,则其全名是 refs/heads/upstream refs/heads/upstream/main .这与 refs/remotes/upstream/main 不同,因此两者不会冲突. Git 将使它们保持笔直.但这对人类来说是令人困惑的:如果您要有一个名为 upstream 的远程服务器,请不要为您的任何分支命名 streamup/whatever


您现在拥有所需的超集

这时,在您自己的笔记本电脑存储库中,拥有您想要/需要的组合来完成所有工作:

  • 由于您的 git克隆运行了 git checkout ,因此您有一个分支名称.该名称可能是 main master .当您运行 git clone url 时,此处使用的实际名称是GitHub Git向您的Git建议的名称.您可以在 git clone 时选择其他名称,例如,使用 git clone -b branch url 但是您没有,所以您有了 main master .

  • 您的叉子中的每个分支名称都有一个 origin/* 形式的远程跟踪名称.

  • 您的分叉中的每个分支名称中,您都有一个 upstream/* 形式的远程跟踪名称.

您现在可以在本地存储库中创建任何您喜欢的新分支名称,指向任何现有的提交.为此,只需运行:

  git分支< name>< commit-hash-ID> 

或:

  git分支< name><现有名称> 

这告诉您的Git创建新的 name 作为分支名称,指向您提供的哈希ID或通过将<将em> existing-name 转换为哈希ID.

(要查看将现有名称转换为哈希ID的过程,请使用 git rev-parse : git rev-parse name 确实可以注意,您可以编写 main heads/main 甚至 refs/heads/main ,以引用您自己的名为 main 的分支.所有这些规则的确切规则在 gitrevisions文档.)

要强行移动一些现有的分支名称,请在笔记本电脑上自己的存储库中提供两个主要选项:

  • 虽然未打开",该分支,请使用 git branch -f 和该名称以及一个哈希ID(或其他第二个名称)来强制移动该名称.
  • 或者,如果您处于在线"状态,在该分支上,使用具有哈希ID(或名称)的 git reset --hard 强制将您当前所在的分支移动到该哈希ID(或名称解析的那个)上./li>

您在"使用 git checkout git switch 切换为该分支名称后的某个分支.该名称将成为您的当前分支,并且 git status 将在分支< whatever> 上说.提交哈希ID存储在 中,该分支名称是您当前的提交. git reset 命令可将您移至其他提交,同时拉动分支名称.-hard 部分告诉此 git reset 重新设置Git的索引和您的工作树(此答案中均未描述).

设置好您的(本地)分支名称后,将 git push 与您的远程名称 origin 一起使用.这使您的Git调用了GitHub上的Git,并连接到GitHub上的fork,就像 git fetch上游所做的一样(但要对您的fork而不是他们的fork进行操作,因为您正在使用git push origin ,而不是 git push origin ).但是, push 对话与 fetch 对话完全不同.

git fetch 始终是安全的.您的Git会调用其他Git.您的Git要求另一个Git:您拥有哪个分支和标签以及其他名称?您有什么我没有的承诺?然后您的Git下载了所有缺失"的承诺.提交和支持对象,并创建或更新您的远程跟踪名称.您的分支均未更改.您的工作均不受影响.您只需将新对象添加到Git的所有对象数据库中,然后更新远程跟踪名称.

但是 git push 是不同的.开头部分是相同的:您的Git调用了其他一些Git.您的Git列出了(部分或全部)您的提交(通过哈希ID-您的分支名称现在还不重要, 并不重要),它们ll检查他们是否需要任何新的提交和其他支持对象.如果他们确实需要它们,您的Git会将它们打包并运送过来.但是就在这一点上,其余的是非常不同的.您的Git现在会根据以下要求询问( git push )或命令( git push --force )他们的Git 设置一些分支名称您在这里使用了什么分支名称.

作为现在的Git用户,您可以使用 git push 的更高级形式:

  git push origin< hash-ID-or-name>:refs/heads/<分支名称> 

例如:

  git push origin HEAD:refs/heads/new-branch 

在这里,通过使用冒号字符,您可以在 left 一侧放置所需的任何内容.您的Git将对此进行 git rev-parse 以确定要发送的提交.然后,在右侧一侧,您可以列出分支名称.有时您需要像这样完全拼写出来.有时您可以只编写 new-branch .您的Git会询问(这种常规推送)或命令(强制推动),他们的Git应该使用此提交哈希ID创建或更新分支 new-branch .

不过,通常您可能会这样做:

  git push origin main 

由于 main 中没有冒号,所以左侧和右侧仅是 main main ,就像您键入 git推送原始main:main .您的Git和他们的Git会确定您要向他们发送 main 提交,如果他们没有提交,然后让他们更新名称主要.

什么时候需要用力推?

只要您的Git调用其他Git并给予他们提交,然后要求他们设置分支名称之一,他们就会进行一些检查.Gem所做的内置检查非常简单.(GitHub为受保护的分支"之类的功能添加了很多功能,可以进行更多检查.为此,请查阅GitHub文档.我们在这里将其忽略.)基本检查仅是:执行此添加新提交,还是取消一些提交?

让我们再次画一个分支.假设他们具有这样的分支:

  ...-- G--H--I<-main 

与此同时,您在您的存储库中:

  ...-- G--H--I--J<-main 

您运行 git push origin main .您的Git和他们的Git授予:您拥有他们没有的提交 J .您的Git移交了它. J 重新连接到 I .他们对此进行了检查,并看到 J 连接回了 I .然后,让您的Git询问他们的Git:请,如果可以,将您的 main 设置为指向 J .

因为 J -甚至是 JKL ,只要加上,就 可以了.他们这样说,您的 git push 完成,现在他们的存储库中有 ...- GHIJ ,并且他们的 main 选择commit J.现在,您的Git会更新您的 origin/main (您的起源的 main 记忆)以及您的 main 和您的 origin/main 全部选择立即提交 J .

但是如果有的话:

  ...-- G--H--J<-main 

因为您从未带入 I ?然后您的 git push 调用他们的Git,通过您的 J 发送(对他们来说仍然是新的),并礼貌地要求他们设置其 main 指向 J .他们检查: J 链接回到 H ,而不是 I .如果他们将其 main 设置为指向 J ,则他们将从其分支中删除 I .那是不行,他们会拒绝您的礼貌要求,说这不是快进.

如果确实需要他们从该分支中​​删除提交,则必须在此使用 git push --force .

将它们放在一起

您将要:

  git clone< url1>cd< new-clone>git remote添加上游< url2>git上游获取git branch new-branch main#假设是main,而不是mastergit reset --hard上游/主要git push origin new-branch#记住旧`main`的最后一次提交git push -f origin main#删除提交 

在运行每一项之前,请确保您了解它们的作用,并且我没有做任何事情.我没有这两个分支,因此无法测试这套确切的命令.

I want a branch of my fork to be equal to the master branch of the fork parent. However, the master branch on my fork is ahead of the parent, and I would like to make changes on a branch before getting rid of the changes on my master branch. How can I do this?

解决方案

Let's define a term here (though GitHub do actually define it themselves): the "fork parent" is the repository you were browsing when you decided you wanted to create a GitHub fork. You clicked their "fork" button and now you have, under your account, a repository that is a clone of their GitHub repository.

Since a GitHub fork is a clone—albeit one with added features—we now have two repositories. Both repositories are ordinary Git repositories. They just reside over on GitHub. We can, if we wish, clone either or both repositories to our own laptop or other computer. Let's hold off on doing that yet, so that we only have two Git repositories to deal with for now.

At this point, having pushed the "fork" button, these two clones are identical (except for the added features, but those lie outside Git proper). A Git repository consists primarily of two databases:

  • There is a database of all "Git objects": commits, mainly, and their supporting objects, plus some stuff for annotated tags. These are all read-only: none can ever be changed. The contents of these objects, particularly the commits, are of interest to humans. There's one problem though: the contents are found by their hash IDs, big ugly numbers written in hexadecimal, that are useless to humans. They look random (though they aren't random at all) and they are unpredictable.

  • Separate from the database of objects, there's a database of names: branch names, tag names, and other names. These names are human-readable and often actually mean something to humans. What's in the database-of-names are the hash IDs of the commits (and the annotated tag data, for annotated tags).

A regular clone, like the one we would make on a laptop, copies all the objects and none of the branches. But the clone made by a GitHub fork button copies both. So right now, our fork has the same set of branches as their fork. So at this point, everything is equal.

Over time, though, they get unequal, because Git repositories generally have commits (and other objects) added to their commit-and-other-objects databases. When that happens with our fork (our GitHub repository), our repository gets "ahead of" theirs. When that happens with their fork, their repository gets "ahead of" ours. When it happens to both, they diverge. This is entirely normal and natural.

You would like to resolve this divergence somehow. But you have conditions:

However, the master branch on my fork is ahead of the parent, and I would like to make changes on a branch before getting rid of the changes on my master branch. How can I do this?

At this point, we need some kind of understanding of what a branch is. Saying branch X is ahead of branch Y or branches X and Y have diverged is all well and good, but what does that really mean? If a Git repository is two databases—and it is—what has actually happened in the two databases?

We already know part of the answer, based on what we said above: some new objects went into the objects database(s). Objects rarely, if ever, get removed from an objects database (and if they do it's all automatic). Git is built to add commits, not take them away.

Given this idea of never removing commits, only ever adding them, what's really critical to the humans is not the commits themselves—though for Git itself, that's all that's really critical—but rather, which commits we can find, and how. And for that, we need the branch-and-other-names database.

The first key to understanding this is that each commit records the hash ID of some earlier commit(s). So from a commit, we can go backwards, to older commits. But no commit ever records the hash ID of any newer commit. That's mainly because we build these things one commit at a time. Each one gets a new, unique, random-looking and unpredictable hash ID. We don't know what the future hash IDs will be: we only know what the past ones were. So a child commit—one that's new, made from some parent (older) commit—is allowed to know the hash ID of its parent, which already exists. But the child does not yet know what children, if any, it will have once it's grown-up. And once the child is born, no part of it can be changed, so it can't record its children.

This gives us a backwards-looking chain: the most recent child, whatever it is, remembers his parent, who remembers her parent—the grandparent of the most recent child—and so on:

... <-F <-G <-H

where each uppercase letter stands in for some random-looking hash ID. But if that's true (and it is), there's still one problem: How will we find the hash ID of the most recent commit?

This is where branch names enter our picture. A branch name just records one hash ID. That one hash ID is the ID of the most recent commit on the branch. So the picture really looks like this:

... <-F <-G <-H   <--main

The name main holds the hash ID of the last commit H. That commit holds a snapshot of all files, and the hash ID of its parent commit G. Commit G holds a snapshot of all files, and the hash ID of an earlier commit F. This repeats for every commit, until we get back to the very first commit ever. That commit—called a root commit—has no parent, because it can't (have a parent). That's also how Git knows to stop going backwards.

To add to a branch, we start by extracting the latest commit:

...--G--H   <-- main

We extract the contents of commit H. We do some work with that and then make a new commit. The new commit gets a new, unique hash ID—which is unique across every Git repository everywhere in the universe (this is why the hash IDs are so big and ugly)—and that commit and its supporting objects go into the big database of objects:

...--G--H   <-- main
         \
          I

Git makes sure that new commit I, before it's created and thus acquires its hash ID, has H's hash ID stored in it, so that I will link back to H. Git knows to use H's hash ID because the name main still points to H. But now that I is in the big database, Git does its last little trick: it writes the hash ID of I into the names database, under the "branch name main" entry. So now we have:

...--G--H
         \
          I   <-- main

and we can draw this as a straight line if we prefer:

...--G--H--I   <-- main

At this point, the two clones have diverged: one has G-H-I with its main pointing to I, and the other has G-H with its main pointing to H. The divergence is that one is strictly ahead of the other. Which one is "ahead"? The one we added commit I to, of course. The other one is "behind".

If we, or anyone, now add some new commit to the "behind" repository, that new commit gets a new, totally-unique hash ID. Let's call it J for short though. So now we'll have:

Repo 1:

...--G--H--I   <-- main

Repo 2:

...--G--H--J   <-- main

Note how, in each repository, the name main selects the last commit. The last commits differ in the two repositories, and both are somehow both ahead of, and behind, the other.

If we want to synchronize the two repositories, we have a big problem. Let's say we wish to synchronize Repo 2 to Repo 1. We can do that, by first grabbing commit I from Repo 1 and shoving it into Repo 2. Because the hash IDs of commits are totally unique, H is the same in both repositories but I and J are different so now Repo 2 has:

          I   ???
         /
...--G--H--J   <-- main

The fact that we've drawn commit I on a separate line is not important (for the same reason we were able to move it around earlier). But the fact that the name main does not point to it, is important, because it's that name that we want to use to find I. If we make Repo 2's main point to I, we get:

          I   <-- main
         /
...--G--H--J   ???

which we can draw as:

 ...--G--H--I   <-- main
          \
           J   ???

if we like, but no matter how we draw it, we can't find commit J any more.

Rescuing commit J

What if we make a new name to point to commit J? Let's start Repo 2 off as before, without I but with J as the last commit on main. In fact, let's put several commits in Repo 2 that aren't in Repo 1:

...--G--H--J--K--L   <-- main

Let's add a new name that *also points to commit L now:

...--G--H--J--K--L   <-- main, extra-name

This extra-name is a branch name, so like any branch name, it points to some commit. We chose L as the commit we want it to point-to. Why did we choose L? That's obvious, isn't it? It's the name main points to.

Now that we've done this, let's grab commit I from Repo 1, and change our existing main to point to commit I, just like their main:

...--G--H--I   <-- main
         \
          J--K--L   <-- extra-name

Now we can find commit L. Commit L points back to commit K, which points back to commit J, which points back to commit H. None of our commits have changed at all. The only thing that did change is the name main, which now points to commit I, which we got from their repository.

The mechanics of doing all of this

The problem with using GitHub to do this is that GitHub has an inflexible, limited interface: the web interface. It's pretty good for what it does—for accessing all the added features that GitHub forks give you—but it's not very good at doing the basic stuff that command-line Git does. Hence, it turns out that the way to do this is with command-line Git.

The one problem with this is that command-line Git is slightly different. We have a way to copy a repository, using git clone. But this copy operation is different from GitHub's "fork a repository" clone. What we do is run:

git clone ssh://git@github.com/user/repo.git

on our laptop or whatever kind of computer we have. (I've used an ssh:// URL here; you can use https:// if you prefer, though GitHub are starting to push people towards using ssh.) This:

  • makes a new, empty repository;
  • uses git remote add to add the URL under a short name by which we can refer to the GitHub repository later: the default short name is origin;
  • copies all of the commits that exist in this repository, but does not copy the branch names; and then
  • does a git checkout, which creates one branch name in our new local repository.

(We then have to cd repo or otherwise enter the new repository, because git clone can't make our command-line-interpreter do that for us. So we'll do that too.)

The interesting thing here is that our Git doesn't copy their branch names. That's because our Git wants to let us make up our own branch names, which need not match theirs at all. We will probably want to have at least some of our branch names match at least some of theirs, but our Git chooses not to force this on us—even if it would make sense. Our Git is tuned to "power user" mode right from the start, even if we're Git beginners.1 What our Git does with their Git's branch names is to change them into remote-tracking names.2

These remote-tracking names are formed by taking their branch names and shoving origin/ in front of them. Technically, we shove in whatever name we used for the remote in the git remote add step—and the actual full name is refs/remotes/remote/name. And, technically, your own (local) branch names have refs/heads/ shoved in front of them. But some or most of these prefixes don't normally show up:

  • If you run git branch you'll see your (local) branch names shortened to things like main or master.
  • If you run git branch -r, you'll see your remote-tracking names shortened to things like origin/main.
  • But for some reason,3 if you run git branch -a, your remote-tracking names are shortened to remotes/origin/main instead.

You can, however, run git for-each-ref, which finds all refs. Branch names and remote-tracking names are just two forms of reference. Tag names are a third form, and Git has a bunch more forms. The for-each-ref command, which isn't really meant for ordinary users, lists all of them, printing out their full names by default, and the objects—well, hash IDs and types—they point-to.


1This was almost certainly a really bad idea. Unfortunately, it's too late to change it now: Git has a "don't break existing users' work-flows" philosophy.

2Git calls these remote-tracking branch names. The word branch here is redundant, and if you use this phrase, you will be tempted to shorten it to remote-tracking branch, which is sort of OK, but then you will be further tempted to shorten to remote branch, which ... is not OK: it's confusing. Do you mean branch name but as seen on the remote here, or do you mean name, as found in local Git repository, under the group remotes/origin/?

3I have no idea why. If this were showing tags, too, it might make sense. But it isn't.


Mechanics, part 2

Now that we have a clone of our fork, we need to add to this clone, any commits they have that we don't. That means we need to instruct our Git how to call up GitHub and read from their fork too.

To do this, we need a second remote. This second remote needs a name. There's a standard second name—though I don't like it much myself—of upstream. You can use that one, or invent one you like better. For this answer I'm going to use upstream here:

git remote add upstream <url>

For the URL, put in an ssh://git@github.com/them/their-repo.git or https://github.com/them/their-repo.git or whatever URL one would use to read from their Git repository over on GitHub: the one you used when you used the FORK button, perhaps turned into an ssh URL.

(You can, if you like, run:

git ls-remote <url>

first to see if you have the URL correct. This has your Git call up that URL and get information about their branches, tags, and other names. It is a lot like running git for-each-ref, except that it uses the Git protocol to read from some other Git.)

Once you have this set up, run:

git fetch upstream

to have your Git call up their GitHub repository. This starts with the same thing as the git ls-remote—in fact, you can now run git ls-remote upstream—but then does more: it gets, from them, any commits that they have, that you don't have, and adds them to your repository, locally, on your laptop or whatever. Then—this is the Power Git User thing I mentioned earlier—it creates, in your Git repository, remote-tracking names for each of their branches.

Because you named this upstream, these remote-tracking names will have the form upstream/main, upstream/develop, and so on. The upstream/ in front of each one—or remotes/upstream in longer form, or refs/remotes/upstream to use the real full name—keeps these remote-tracking names from interfering with any of your own branch names.4


4Note that if you have a branch named upstream or upstream/main of your own, its full name is refs/heads/upstream or refs/heads/upstream/main. This is different from refs/remotes/upstream/main, so the two don't collide. Git will keep them straight. But it's confusing to humans: if you're going to have a remote named upstream, don't name any of your branches upstream/whatever.


You now have the superset that you need

At this point, in your own laptop repository, you have the combination you want/need to get everything done:

  • You have one branch name as a result of your git clone having run git checkout. That one name is probably main or master. The actual name used here is the one that the GitHub Git recommended to your Git, when you ran git clone url. You could pick a different name at git clone time, with git clone -b branch url, for instance, but you didn't, so you got main or master.

  • You have one remote-tracking name of the form origin/* for each branch name in your fork.

  • You have one remote-tracking name of the form upstream/* for each branch name in their fork.

You can now create any new branch names you like, in your local repository, pointing to any existing commit. To do so, just run:

git branch <name> <commit-hash-ID>

or:

git branch <name> <existing-name>

This tells your Git to create the new name as a branch name, pointing to the commit whose hash ID you gave, or whose hash ID was found by turning existing-name into a hash ID.

(To see the process of turning existing names into hash IDs, use git rev-parse: git rev-parse name does exactly that, reading the names database and figuring out the hash ID. Note that you can write main, or heads/main, or even refs/heads/main, to refer to your own branch named main. The exact rules for all of this are listed out in the gitrevisions documentation.)

To forcibly move some existing branch name, in your own repository on the laptop, you have two main options:

  • While not "on" that branch, use git branch -f with the name and a hash-ID (or another second name) to force-move the name.
  • Or, if you are "on" that branch, use git reset --hard with a hash ID (or name) to force-move the branch you are currently on, to that hash ID (or the one the name resolves to).

You are "on" some branch after using git checkout or git switch to switch to that branch name. That name becomes your current branch, and git status will say on branch <whatever>. The commit hash ID stored in that branch name is your current commit. The git reset command moves you to some other commit, yanking the branch name along with it; the --hard part tells this git reset to re-set both Git's index and your working tree (neither of which is described in this answer).

Once you have your (local) branch names set up the way you like, use git push with your remote-name origin. This has your Git call up the Git over at GitHub connected to your fork on GitHub, just like the git fetch upstream did (but to your fork, not their fork, since you're using git push origin and not git push upstream). The push conversation, however, is quite different from the fetch conversation.

A git fetch is always safe. Your Git calls up some other Git. Your Git asks that other Git: What branch and tag and other names do you have? What commits do you have that I don't? Your Git then downloads any "missing" commits and supporting objects, and creates or updates your remote-tracking names. None of your branches get changed. None of your work is affected. You just add new objects to your Git's database-of-all-objects, and update remote-tracking names.

But git push is different. The beginning part is the same: your Git calls up some other Git. Your Git lists out (some or all of) your commits (by hash ID—your branch names don't matter yet, not right at this point), and they'll check to see if they need any new commits and other supporting objects. If they do need them, your Git will package them up and ship them over. But right at this point, the rest is very different. Your Git now asks (git push) or commands (git push --force) their Git to set some of their branch names, based on what branch names you used here.

As a now-power-Git-user, you can use a fancier form of git push:

git push origin <hash-ID-or-name>:refs/heads/<branch-name>

e.g.:

git push origin HEAD:refs/heads/new-branch

Here, by using the colon character, you get to put anything you like on the left side. Your Git will do a git rev-parse on this to figure out which commit(s) to send. Then, on the right side, you can list out a branch name. Sometimes you will need to spell it out in full like this. Sometimes you can just write new-branch. Your Git will ask (this kind of regular push) or command (force-push) that their Git should create or update their branch new-branch using this commit hash ID.

Usually, though, you will probably just do:

git push origin main

Since there's no colon in main, the left and right sides are just main and main, as if you typed git push origin main:main. Your Git and their Git will figure out that you want to send your main commits to them, if they don't have them, and then have them update their name main.

When do you need force-push?

Whenever you have your Git call up some other Git and give them commits and then ask them to set one of their branch names, they will do some checking. The built in checking, that all Gits do, is pretty simple. (GitHub adds a lot of features for things like "protected branches" that can do more checking. For this, consult the GitHub documentation. We'll ignore this here.) The basic check is just: does this add new commits, or does this take some commits away?

Let's draw a branch again. Suppose they have a branch that goes like this:

...--G--H--I   <-- main

Meanwhile, you have, in your repository:

...--G--H--I--J   <-- main

You run git push origin main. Your Git and their Git confer: you have a commit J that they don't have. Your Git hands it over. J connects back to I. They inspect this and see that J connects back to I. You then have your Git ask their Git: Please, if it's OK, set your main to point to J.

Because J—or even J-K-L, if you had that—just adds on, it is OK. They say so and your git push finishes and they now have ...-G-H-I-J in their repository with their main selecting commit J. Your Git now updates your origin/main—your memory of origin's main—and your main and your origin/main all select commit J now.

But what if you have:

...--G--H--J   <-- main

because you never brought I in? Then your git push calls up their Git, sends over your J (it's still new to them), and politely asks them to set their main to point to J. They check: J links back to H, not I. If they set their main to point to J, they will drop I from their branch. That is not OK and they will reject your polite request, saying that it is not a fast-forward.

This is when you must use git push --force, if you really need them to drop a commit from this branch.

Putting it all together

You will want:

git clone <url1>
cd <new-clone>
git remote add upstream <url2>
git fetch upstream
git branch new-branch main       # assuming main, not master
git reset --hard upstream/main
git push origin new-branch       # remember the last commit from your old `main`
git push -f origin main          # drop commits

Before you run each of these, make sure that you understand what they do, and that I didn't goof any of this up. I don't have these two forks and hence can't test this exact set of commands.

这篇关于git create分支等于fork父节点的master的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆