创建分支与执行软重置之间的区别?返回到旧的工作版本的最佳方法? [英] Difference between creating a branch and doing a soft reset? Best way to go back to an old working version?

查看:78
本文介绍了创建分支与执行软重置之间的区别?返回到旧的工作版本的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我提交的历史是A-B-C,我只有这个分支.

B完全正常工作.我开始在C语言中添加一些功能,但无法正常工作,因此我需要回到B语言,但是我也想保留用C语言编写的代码,因为我想对其进行检查并修复它.之后. 最好的方法是什么?

从B开始创建新分支的最佳方法是吗?

与进行软重置有什么区别?我知道软重置不会删除更改(正确吗?),但是我不清楚如何恢复这些更改(C中的代码),也不清楚软重置与创建分支之间的区别是什么. /p>

在旁边

Git看起来似乎毫无必要地神秘而晦涩.我的意思是,官方文档将push定义为:

https://git-scm.com/docs/git-push

git-push-更新远程引用以及相关对象

我确信从技术上讲它是正确的,但这并不是最人性化的解释.他们是否可以添加一条注释来说明它将本地存储库上传到远程存储库,还是类似的东西?

解决方案

这里的所有答案都可以.所缺少的是,好吧...这就是您的咆哮出现的地方.:-)您的教授语录在按照爱因斯坦的说法,将所有内容尽可能简单,但不要简单."

不幸的是,Git所做的-分布式源代码控制- 本质上很复杂.幸运的是,有一些简单的入门方法.不幸的是,在我看来,传统书籍以及Git的文档本身做得还不是很好.我认为 Pro Git书非常好(并且具有通常是最新的),但不幸的是,现在还有其他一些非常过时的书,虽然还算不错,但是大多数介绍都是在没有适当基础的情况下开始的.

基金会也需要一些术语.这可能是Git手册页最难解决的地方.他们只是在各地散布术语-有时不一致术语,尽管随着时间的推移这种情况有所改善.这导致了一些非常有趣的网页. (我认为有关Git的许多介绍都避开了术语,因为Git的核心基础在于图论和散列论,人们发现了那些令人恐惧的数学方面.)

Git本身会使事情变得比必要的难.一个简单的存在证明就是Mercurial. Mercurial和Git至少在处理源代码方面具有同样强大的功能-但是,分布式源代码控制的新手 far 在Mercurial中开始的问题少于在Git中.目前尚不清楚100%为何如此,但我认为Mercurial会产生不同结果的两个关键方面是不同的:

  • 在Mercurial中,分支机构是全球性的,并且是永久性的.这对于开始工作非常方便,但至少在某些情况下被证明是陷阱. Mercurial最终添加了书签,其功能类似于Git的分支机构.

  • Mercurial没有Git称为 index 的东西.

这些不是唯一的东西-Git还有很多其他较小的烦恼,而Mercurial则不存在-但我认为它们是最大的两个.例如,git reset的整个问题不会在Mercurial中发生,因为git reset(a)操纵分支指针(Mercurial拥有这些书签, if 您选择使用它们)和(b )操作Mercurial甚至没有的索引.

我自己的答案:发生了什么

无论如何,这里的关键是这三件事. (这里有一些术语!)

  1. 在Git中,分支名称仅是名称到哈希ID的映射.重要的是 commits .

  2. commit 是一个唯一实体,由唯一的哈希ID(例如b5101f929789889c2e536d915698f58d5c5c6b7a)标识,该ID永久存储 1 且文件的快照以及一些元数据,其中包括其他一些提交的哈希ID.

  3. 索引是Git实际用于构建 new 提交的区域.


1 无论如何,与提交一样永久.如果无法查找,则提交最终会消失.这是分支名称和图论出现的地方,但是我们稍后再讨论.


有关索引的知识

让我们从观察开始:当一个提交存储所有文件的快照时,它将它们保持在压缩的,只读的,仅Git的存储形式中.它们就像是冷冻的或冻干的.没有人可以改变它们.存档(保存旧的源代码)很好,但是对于完成任何 new 工作完全没有用.

要完成工作,您需要一个文件,这些文件以正常的日常形式处于未冻结,已补水状态,可读性可写的位置.那个地方就是Git所说的您的工作树. Git可以在这里停止工作(冻结提交和灵活的工作树),这就是Mercurial所做的,并且工作正常.但是无论出于何种原因,Git都会添加它称为 index 或有时是 staging区域甚至是 cache 的东西. (使用的名称取决于是谁/做什么工作,但是这三个都是同一件事.此外,索引本身比我要讲的还要复杂,但是我们不必担心这里的复杂性. )

索引存储的是Git格式的文件副本.它们并非完全 frozen ,但它们采用的格式相同,即冻干格式.它们对 you没用; 它们仅对Git有用. 为什么这样做是有争议的,但是它做到了,您需要了解这一点. 这样做是因为索引是Git使 new 提交的方式.

运行时:

git commit -m "this is a terrible log message"

Git将立即打包索引 中的所有内容以及您的元数据(您的姓名和电子邮件地址以及日志消息等),并将其转换为新的提交.工作所在的工作树中的内容完全无关紧要! git commit如此之快的原因在于,已经准备好了一切(事实上已经冻干了). Mercurial的hg commit提交了您的工作树中的内容,它必须检查工作树中的每个文件 以查看其是否与先前的文件相同. ,如果没有,请准备冻干的表单以进行提交.因此,在一个大项目中,您运行hg commit,然后出去喝咖啡或其他任何东西. 2 但是,使用Git,如果您在工作树中更改文件,Git就会使您成为运行:

git add file

每次.将该文件复制(在将其冻干或Git化它的同时)到索引中.

因此,索引总是包含您打算进行的下一次提交.如果您对工作树进行了某些更改,并希望在下一次提交中进行更改,则必须在运行git commit之前将它们显式复制到索引中.您可以 使用git commit -a让Git扫描您的工作树并为您执行add,使Git像Mercurial一样使用Mercurial.这当然很方便,并且可以让您不用考虑索引,甚至不假装它不存在.但是我认为这是一个糟糕的计划,因为这样git reset变得莫名其妙.


2 通常不是不好,在一个小的项目中,这种差异几乎是无法检测到的. Mercurial使用了许多缓存技巧来尽可能地加快速度,但是与Git不同,它将这些内容保留在用户的视线之外.


提交

现在,让我们仔细看看提交的内容.我认为最好的方法是查看实际的提交.您可以使用以下方法查看自己的内容:

git cat-file -p HEAD

但是我将像这样从Git的Git存储库中展示这一点:

$ git cat-file -p b5101f929789889c2e536d915698f58d5c5c6b7a | sed 's/@/ /'
tree 3f109f9d1abd310a06dc7409176a4380f16aa5f2
parent a562a119833b7202d5c9b9069d1abb40c1f9b59a
author Junio C Hamano <gitster pobox.com> 1548795295 -0800
committer Junio C Hamano <gitster pobox.com> 1548795295 -0800

Fourth batch after 2.20

Signed-off-by: Junio C Hamano <gitster pobox.com>

请注意treeparent行,它们引用了其他哈希ID. tree行代表已保存的源代码快照.它可能不是唯一的!假设您进行了提交,然后稍后又返回到旧版本,但有意将其另存为 new 提交.新的提交可以重用原始提交的tree,Git会自动做到这一点.这是Git压缩档案快照的众多技巧之一.

parent行是Git提交成为 graph 的方式.此特定提交为 b5101f929789889c2e536d915698f58d5c5c6b7a .在此提交之前 的提交是 a562a119833b7202d5c9b9069d1abb40c1f9b59a ,这是合并提交:

$ git cat-file -p a562a119833b7202d5c9b9069d1abb40c1f9b59a | sed 's/@/ /'
tree 9e2e07ce274b0a5a070d837c865f6844b1dc0de8
parent 7fa92ba40abbe4236226e7d91e664bbeab8c43f2
parent ad6f028f067673cadadbc2219fcb0bb864300a6c
author Junio C Hamano <gitster pobox.com> 1548794876 -0800
committer Junio C Hamano <gitster pobox.com> 1548794877 -0800

Merge branch 'it/log-format-source'

Custom userformat "log --format" learned %S atom that stands for
the tip the traversal reached the commit from, i.e. --source.

* it/log-format-source:
  log: add %S option (like --source) to log --format

此提交有两条 parent行,给出了另外两次提交.这就是首先使它成为合并提交的原因.

这意味着什么,如果我们放弃了查看源代码的概念(我们可以通过每次提交中使用tree行将其重新带回,每个提交都有),我们可以查看源代码.承诺自己只是一个图中的链接节点系列,每个节点都有其自己的唯一哈希ID,每个哈希ID都会记住某些先前节点或 parent 节点的哈希ID.

我们可以这样绘制这些:

A <-B <-C

用于简单的三提交存储库,或者:

...--I--J--M--N
  \       /
   K-----L

用于更复杂的存储库,该存储库具有作为 last 提交的父级的合并(在右侧).我们使用一个大写字母来代表实际的,显然是随机的哈希ID,因为哈希ID很难处理(但是单个字母非常有效).从子提交到其父提交的箭头或连接线是实际提交中的parent线.

再次记住,所有这些提交将被永久冻结.我们不能改变其中任何一个方面.我们当然可以进行 new 提交(像往常一样从索引中提交).如果我们不喜欢提交C或提交N,则可以替换它,例如:

     D
    /
A--B--C

然后我们可以弯曲C并使用D代替:

A--B--D
    \
     C

这些是相同的 graph ,我们只是以不同的眼光看待它.

分支名称(和其他名称,但我们在这里不介绍)

这些图形绘图简洁明了,我将争论一下有关Git存储库的推理方法.它们显示 commits ,并向我们隐藏丑陋的哈希ID.但是Git实际上确实需要哈希ID(这就是Git检索提交的方式),并且我们需要记住这些链中任何一个的 last 哈希ID.现在我们只需要 last 的原因应该很明显:如果我们抓住例如commit D,那么,提交D会存储 actual 哈希内部的commit B的ID.因此,一旦我们知道了D的哈希,就可以使用D查找B.然后,我们使用B查找A,并且-由于A是第一个提交,因此没有 父级,因此我们可以停下来休息.

因此,我们在这里需要对图纸进行额外的添加.我们需要一个分支名称.该名称仅指向最后一次提交(即包含其实际的哈希ID)!我们可以将其绘制为:

A--B--D   <-- master
    \
     C

名称master,保存上一次提交的哈希ID .从那里我们找到先前的提交. Git为我们存储的是:

  • 所有提交,通过哈希ID
  • 一些名称,每个名称拥有一个一个哈希ID

以及Git的工作原理(除了索引和工作树的所有复杂性之外).要创建 new 提交E,我们只需对索引进行快照,添加包含提交D的哈希ID的元数据(我们的姓名,电子邮件地址等),然后将其写入到提交中数据库:

        E
       /
A--B--D   <-- master
    \
     C

,然后让Git 自动更新名称master,以指向我们刚刚进行的新提交:

        E   <-- master
       /
A--B--D
    \
     C

现在我们可以理清纠结:

A--B--D--E   <-- master
    \
     C

可怜的寂寞提交C呢?它没有名字.它有一些实际的丑陋的哈希ID,但是如果没有名称或没有记住该哈希ID,我们将如何找到提交C?

?

答案是,除非我们给它命名,否则Git最终将完全删除它们.要使用的明显名称是另一个分支名称,所以让我们这样做:

A--B--D--E   <-- master
    \
     C   <-- dev

现在我们有两个分支,masterdev.名称master 表示提交E",名称dev 表示提交C",目前 .当我们使用存储库并向其中添加新的提交时,存储在这两个名称下的哈希ID将发生变化.这导致我们的主要观察结果:在Git中,提交是永久的(大部分),并且是不可更改的(全部),但是分支名称​​ move . (这些提交链及其内部箭头以一种向后看的方式将它们连接起来)对我们而言.通过添加更多提交,我们可以随时添加它.而且,Git为我们存储了一个到哈希ID映射表,分支名称在图中保存了起点(或终点?)的哈希ID.

这些起点/终点的Git术语是 tip commit .分支名称标识提示提交.

HEADgit checkout以及索引和工作树

现在我们的存储库中有多个分支名称,我们需要某种方式来记住我们正在使用哪个分支.这是特殊名称HEAD的主要功能.在Git中,我们使用git checkout选择一些现有的分支名称,例如masterdev:

$ git checkout dev

导致:

A--B--D--E   <-- master
    \
     C   <-- dev (HEAD)

通过将名称HEAD附加到像dev这样的分支名称,Git知道我们正在处理哪个分支.

作为关键的副作用,Git :

  • 将所有文件从C复制到索引中,为下一次提交做好准备,并且
  • 将所有文件从C/the-index复制到工作树中,以便我们查看和使用它们.
如果我们正在提交E,并且它包含的文件不在C中,则

Git可能还需要删除一些文件.它将从索引和工作树中删除它们.和往常一样,Git确保每个文件的所有三个副本匹配.例如,如果在提交C中有一个名为README的文件,我们将:

  • HEAD:README:这是提交C中Git冻结的冻结副本,现在可以以特殊名称HEAD访问.
  • :README:这是索引副本.目前它与HEAD:README匹配,但是我们可以用git add 覆盖.
  • README:这是一个常规文件.我们可以使用它. Git对此并不十分在意-如果我们更改它,我们需要将其复制回:README

因此,只需执行一项操作-git checkout mastergit checkout dev-我们:

  • 重新连接HEAD;
  • 填写索引;和
  • 填充工作树

现在可以工作了,git add文件将它们复制回索引中,并且git commit制作新快照,将其添加到分支中并使分支 name 引用新的提交.让我们在dev上进行新提交F:

... edit some file(s) including README ...
git add README                    # or git add ., or git add -u, etc
git commit -m "another terrible log message"

现在我们将拥有:

A--B--D--E   <-- master
    \
     C--F   <-- dev (HEAD)

Git知道更新dev而不是master,因为HEAD附加到dev而不是master.还要注意,由于我们现在通过索引中的内容进行了提交F,并且我们使索引与工作树匹配,因此现在F,索引和工作树都匹配了.如果我们现在运行git checkout dev,这就是我们想要的!

这是git reset出现的地方

除了最终会被删除的 unreachable 提交的特殊情况外,该图本身只能添加到其中.分支名称,但是,我们可以随时随地移动.用于执行此操作的主要命令是git reset.

例如,假设提交F太糟糕了-这是一个错误,我们只想完全忘记它.我们需要做的是移动名称dev,以便它不再指向F,而是再次指向C,即F的父代.

我们可以找到提交C的哈希ID,粗鲁地,只需将直接写入分支名称中即可.但是,如果这样做,我们的索引和工作树又如何呢?它们仍将匹配提交F的内容.我们将得到 graph:

A--B--D--E   <-- master
    \
     C   <-- dev (HEAD)
      \
       F

,但是索引和工作树与C不匹配.如果再次运行git commit,将得到看起来与F几乎完全相同的提交-它将共享tree,并且具有不同的日期戳和更好的日志消息.但这也许就是我们想要的!也许我们想要只是修复了我们糟糕的日志消息.在这种情况下,从当前索引中重新创建一个G就是答案.

这就是git reset --soft的作用:它使我们移动分支名称可以指向不同的提交,而无需更改索引和工作树.我们丢弃F,然后制作一个新的G,它与F一样,但有正确的消息. F没有名字,最终枯萎了.

但是,如果我们只想完全摆脱F怎么办?然后,我们希望索引和工作树匹配提交C.我们将像以前一样让F枯萎.但是要使索引和工作树与C匹配,我们需要git reset --hard.

由于索引和工作树 是独立的实体,因此我们可以选择半途而废.我们可以将名称dev指向C,将 index 的内容替换为C的内容,但不保留 work-tree .这就是git reset --mixed的作用,而git reset --mixed实际上是git reset的默认值,因此我们甚至不需要--mixed部分.

所有这三个动作都有不同的最终目标:git reset --soft用于重新执行提交git reset --hard用于完全放弃提交,而git reset --mixed在此特定示例中没有明确的用法.那么为什么它们都拼写为git reset?那是您的咆哮再次出现的地方:他们可能不应该.它们之间的关系在于,Git可以使用分支名称到提交哈希以及索引和工作树内容来完成以下三件事:

  1. 移动分支名称
  2. 替换或保留索引内容
  3. 替换或保留工作树内容

git reset将执行步骤1并停止(git reset --soft),或者执行步骤1和2并停止(git reset --mixed/默认值),或者执行所有三个步骤并停止(git reset --hard).但是它们的目的无关:Git混淆了目标(到达那里"的机制("如何我们从这里到达那里").

结论

说我提交的历史是A-B-C,我只有这个分支.

确定:

A--B--C   <-- branch (HEAD)

我需要回到B,但是我也想保留我用C编写的代码

好.显然,我们想要的是一个名称标识提交B和另一个名称标识提交C.但是我们还需要关注索引和工作树!

只有一个索引和一个工作树, 3 ,而那些没有git clone复制.只有 commits 是永久的.因此,如果索引和/或工作树中有未保存的内容,则可能现在应该保存它. (可能是通过提交-您可以使用git stash进行不在 any 分支上的提交,但是至少现在还不行.)假设您不这样做,以便彻底删除问题.

不会改变.您只需要添加一个新名称.有很多方法可以做到这一点,但为说明起见,让我们这样做:首先创建一个新的分支名称,该分支名称​​也指向commit C,我们将其称为save. >.为此,我们将使用git branch,它可以创建指向现有提交的新名称:

$ git branch save

新名称指向的默认位置是使用 current 提交(通过HEAD和当前分支名称),所以现在我们有了:

A--B--C   <-- branch (HEAD), save

HEAD尚未移动:它仍连接到branch,该指针仍指向C.请注意,两个分支都标识相同的提交C,并且所有三个提交都在两个分支上. 4

现在我们有了名称save,保存了C的哈希ID,我们可以随意移动名称branch指向提交B.为此,我们将使用git reset.我们也想让索引和工作树匹配提交B,所以我们想要git reset --hard,它将替换我们的索引和工作树,这就是为什么它很重要确保我们不需要保存任何内容:

$ git reset --hard <hash-of-B>

给予:

A--B   <-- branch (HEAD)
    \
     C   <-- save

当然,还有许多其他选择.例如,我们可以让branch指向C并创建一个指向B new 名称:

A--B   <-- start-over
    \
     C   <-- branch (HEAD)

为此,我们可以使用:

$ git branch start-over <hash-of-B>

由于我们没有移动HEAD,因此无需以任何方式干扰索引和工作树.如果我们有未提交的工作,我们现在可以根据需要运行git add(如果需要,可以更新索引),然后运行git commit进行新的提交D,它将C作为其父项.


3 这实际上不是事实.有一个 main 工作树,并且有一个 main 索引.您可以创建任意数量的临时索引文件,并且自Git 2.5起,您可以随时添加辅助工作树.每个添加的工作树都有自己的索引(毕竟索引是对工作树的索引/缓存)以及自己的HEAD,因此每个工作树可以并且实际上必须位于不同的分支上.但是,这又不是您需要担心的事情.创建临时索引实际上仅用于特殊目的的操作:例如, git stash会提交当前的工作树而不会弄乱其他东西.

4 这就是Git和Mercurial的巨大区别:在Mercurial中,每个提交都完全在一个分支上,并且永远存在.您实际上不能制作两个标识同一提交的分支名称. Mercurial也不使用此分支名称等于提示提交,而其他提交则通过遍历图技巧而隐含.


哈希ID有一个窍门

我只是在这里顺便提一下.上面,在很多情况下,您可能必须运行git log并剪切并粘贴较大的丑陋哈希ID.我们已经知道名称,就像分支名称一样,可以让我们使用名称代替ID.不必写出branchsave所指向的C的哈希ID,我们可以使用名称:

git show save

例如,

将提取提交C,然后提取提交B,将二者进行比较,并向我们展示BC中的快照有何不同.但是我们可以做得更好:

git show save~1

表示:查找提交C.然后,退回一个父链接.这就是提交B.因此,git show现在将提取B父级A中的快照,将两者进行比较,并向我们展示我们在B中所做的更改.代字号~和帽子^字符可以在任何修订说明符上用作后缀. 《 gitrevisions手册》中记录了有关如何指定修订的完整说明(主要是提交或提交范围)..有很多种方法!


几年前,我尝试着手写一本书,该书同时使用 Git Mercurial,以此来使人们开始使用图形和哈希-基于分布式的源代码控制.不幸的是,大部分的工作都发生在工作之间,而且我已经好多年没有工作了,所以一直停滞不前.但是对于那些想要了解其中内容的人,请在此处.

>

Say the history of my commits is A - B - C and I have only this branch.

B was fully working. I started adding some functionality in C, but it's not working so I need to go back to B, but I also want to retain the code I wrote in C because I will want to review it and fix it later. What is the best way to do it?

Is the best way to create a new branch starting from B?

What is the difference between that and doing a soft reset? I understand a soft reset doesn't delete the changes (is that correct?) but it's not clear to me how to restore those changes (the code in C), nor what the difference between a soft reset and creating a branch is.

Aside

Git just seems needlessly arcane and obscure. I mean, the official docs define push as:

https://git-scm.com/docs/git-push

git-push - Update remote refs along with associated objects

I am sure it is technically correct, but it is hardly the most user-friendly explanation. Could they have added a comment explaining it uploads the local repository to the remote one, or something like that?

解决方案

All the answers here are OK. What's missing is, well ... this is where your rant comes in. :-) Your professor-quote here is quite apposite:

One of my best professors at uni always said: beware of those who try to dumb down very complex concepts, but also beware of complexity for its own sake: those who cannot explain a simple concept in a simple way either want to show off or do not really understand the concept themselves!

Or, as Einstein supposedly put it, "Make everything as simple as possible, but no simpler."

Unfortunately, what Git does—distributed source code control—is inherently complex. Fortunately, there are some simple ways to get started. Unfortunately, traditional books, and Git's documentation itself, do this not-so-well, in my opinion. The Pro Git book is, I think, pretty good (and has the advantage of generally being up-to-date), and there are some other books that are unfortunately terribly out of date now that were pretty good, but most introductions try to start without a proper foundation.

The foundation requires some terminology as well. This is probably where the Git manual pages fail the hardest. They just spray terminology—sometimes inconsistent terminology, although this has improved over time—all over the place. This has led to some pretty funny web pages. (I think a lot of introductions to Git shy away from terminology because the core foundation of Git lies in graph theory and hashing theory, and people find the mathematical aspect of those scary.)

Git itself makes things harder than necessary. A simple existence proof of that is Mercurial. Mercurial and Git are, at least in terms of what they do with source code, equally powerful—but those new to distributed source control have far fewer problems getting started in Mercurial than they do in Git. It's not 100% clear why that is, but I think there are two key things Mercurial does differently that produce this result:

  • In Mercurial, branches are global and permanent. This is very convenient for beginning work but, at least sometimes, proves to be a trap. Mercurial eventually added bookmarks that work like Git's branches.

  • Mercurial does not have the thing that Git calls the index.

These aren't the only things—Git has a lot of other, smaller annoyances as well that just aren't there in Mercurial—but I think they are the big two. For instance, the entire question of git reset doesn't occur in Mercurial because git reset (a) manipulates branch pointers—Mercurial has those bookmarks instead, if you choose to use them—and (b) manipulates the index that Mercurial doesn't even have.

My own answer: what's going on

Anyway, the key here is these three things. (Here comes some terminology!)

  1. In Git, a branch name is little more than a name-to-hash-ID mapping. What matters are the commits.

  2. A commit is a unique entity, identified by a unique hash ID like b5101f929789889c2e536d915698f58d5c5c6b7a, that stores—permanently1 and unchangably—a snapshot of files and some metadata, including the hash ID(s) of some other commit(s).

  3. The index is the area that Git actually uses to build new commits.


1Well, as permanent as the commit, anyway. Commits eventually go away if there's no way to find them. This is where branch names and graph theory come in—but we'll get to that later.


What to know about the index

Let's just start with this observation: when a commit stores a snapshot of all of your files, it keeps them in a compressed, read-only, Git-only storage form. They're sort of frozen or freeze-dried, as it were. No one can change them at all. That's fine for archival—saving old source code—but completely useless for getting any new work done.

To get work done, you need a place where your files are unfrozen, rehydrated, readable and writable, in their normal everyday form. That place is what Git calls your work-tree. Git could stop here—frozen commits and flexible work-tree—and that's what Mercurial does and it works fine. But for whatever reason, Git adds this thing it calls the index, or sometimes the staging area, or even the cache. (The name used depends on who / what is doing the naming, but all three are the same thing. Also, the index itself is more complicated than I'll go into, but we don't need to worry about these complications here.)

What the index stores is Git-ified copies of files. They're not exactly frozen, but they are in the same format—the freeze-dried format, as it were. They're not useful to you; they're only useful to Git. Why it does this is debatable, but it does this, and you need to know about it. What it does with this is that the index is how Git makes new commits.

When you run:

git commit -m "this is a terrible log message"

Git will package up whatever is in the index right now, along with your metadata—your name and email address and the log message and so on—and turn that into a new commit. The stuff in your work-tree, where you're doing your work, is entirely irrelevant! The fact that everything is already prepared—already freeze-dried, as it were—is what makes git commit so fast. Mercurial's hg commit, which commits what's in your work-tree, has to check every file in your work-tree to see if it's the same as the previous one or not, and if not, prepare the freeze-dried form for the commit. So in a big project you run hg commit and then go out for coffee or whatever.2 But with Git, if you change a file in the work-tree, Git makes you run:

git add file

every time. This copies the file—while freeze-drying or Git-ify-ing it—into the index.

Hence, the index always contains the next commit you're proposing to make. If you make some changes to the work-tree, and want them in your next commit, you have to explicitly copy them into the index before you run git commit. You can use git commit -a to have Git scan your work-tree and do the adds for you, making Git act the way Mercurial would if you were using Mercurial. That's certainly convenient and lets you not think about the index, or even pretend it's not there. But I think it's a bad plan because then git reset becomes inexplicable.


2It's usually not that bad, and in a small project the difference is nearly undetectable. Mercurial uses a lot of cache tricks to speed this up as much as it can, but—unlike Git—it keeps those out of the way of the user.


Commits

Now let's look closely what what, exactly, goes into a commit. I think the best way to see this is to look at an actual commit. You can look at your own with:

git cat-file -p HEAD

but I'll show this one from the Git repository for Git like this:

$ git cat-file -p b5101f929789889c2e536d915698f58d5c5c6b7a | sed 's/@/ /'
tree 3f109f9d1abd310a06dc7409176a4380f16aa5f2
parent a562a119833b7202d5c9b9069d1abb40c1f9b59a
author Junio C Hamano <gitster pobox.com> 1548795295 -0800
committer Junio C Hamano <gitster pobox.com> 1548795295 -0800

Fourth batch after 2.20

Signed-off-by: Junio C Hamano <gitster pobox.com>

Note the tree and parent lines, which refer to additional hash IDs. The tree line represents the saved source code snapshot. It might not be unique! Suppose you make a commit, then later, go back to an old version but save that as a new commit on purpose. The new commit can re-use the original commit's tree, and Git will do just that, automatically. This is one of many tricks Git has up its sleeves for compressing archived snapshots.

The parent line, though, is how Git commits become a graph. This particular commit is b5101f929789889c2e536d915698f58d5c5c6b7a. The commit that comes before this commit is a562a119833b7202d5c9b9069d1abb40c1f9b59a, which is a merge commit:

$ git cat-file -p a562a119833b7202d5c9b9069d1abb40c1f9b59a | sed 's/@/ /'
tree 9e2e07ce274b0a5a070d837c865f6844b1dc0de8
parent 7fa92ba40abbe4236226e7d91e664bbeab8c43f2
parent ad6f028f067673cadadbc2219fcb0bb864300a6c
author Junio C Hamano <gitster pobox.com> 1548794876 -0800
committer Junio C Hamano <gitster pobox.com> 1548794877 -0800

Merge branch 'it/log-format-source'

Custom userformat "log --format" learned %S atom that stands for
the tip the traversal reached the commit from, i.e. --source.

* it/log-format-source:
  log: add %S option (like --source) to log --format

This commit has two parent lines, giving two more commits. That's what makes this a merge commit in the first place.

What all this means is that if we throw out the notion of looking at the source code (we can bring it back any time by using the tree lines from each commit—every commit has one), we can view the commits themselves as just a linked series of nodes in a graph, each with its own unique hash ID, each of which remembers the hash ID of some predecessor or parent nodes.

We can draw these like this:

A <-B <-C

for a simple three-commit repository, or:

...--I--J--M--N
  \       /
   K-----L

for a more complicated repository with a merge as the parent of the last commit (on the right). We use one uppercase letter to stand in for the actual, apparently-random hash ID, because hash IDs are unwieldy (but single letters are pretty wieldy). The arrows, or connecting lines, from a child commit back to its parent(s) are the parent lines in the actual commit.

Remember, again, that all these commits are frozen in time, forever. We cannot change any aspect of any of them. We can of course make a new commit (from the index as usual). If we don't like commit C or commit N, we can make a replacement for it, e.g.:

     D
    /
A--B--C

Then we can bend C out of the way and use D instead:

A--B--D
    \
     C

These are the same graph, we're just looking at it differently.

Branch names (and other names but we won't cover them here)

These graph drawings are neat and simple and, I'll argue, the way to reason about your Git repository. They show the commits, and they hide the ugly hash IDs from us. But Git does actually need the hash IDs—that's how Git retrieves the commits—and we're going to need to remember the last hash ID of any one of these chains. The reason we only need the last one should be obvious now: if we grab hold of, say, commit D, well, commit D stores the actual hash ID of commit B inside itself. So once we know D's hash, we use D to find B. Then we use B to find A, and—since A is the very first commit and therefore has no parent—we can stop and rest.

So we need one more addition to our drawing here. What we need is a branch name. The name simply points to (i.e., contains the actual hash ID of) the last commit! We can draw this as:

A--B--D   <-- master
    \
     C

The name, master, holds the hash ID of the last commit. From there we find the previous commits. What Git stores for us is:

  • all of the commits, by hash ID
  • some set of names, each of which holds one hash ID

and that—except for all the complications with index and work-tree—is how Git works. To make a new commit E, we just snapshot the index, add the metadata (our name, email address, etc) including the hash ID of commit D, and write that into the commit database:

        E
       /
A--B--D   <-- master
    \
     C

and then have Git automatically update the name master to point to the new commit we just made:

        E   <-- master
       /
A--B--D
    \
     C

Now we can straighten out the kink:

A--B--D--E   <-- master
    \
     C

What about poor lonely commit C, though? It has no name. It has some actual big ugly hash ID, but how, without a name or memorizing that hash ID, will we ever find commit C?

The answer is that Git will eventually delete C entirely unless we give it a name. The obvious name to use is another branch name, so let's do that:

A--B--D--E   <-- master
    \
     C   <-- dev

Now we have two branches, master and dev. The name master means "commit E" and the name dev means "commit C", at the moment. As we work with the repository and add new commits to it, the hash IDs stored under these two names will change. This leads to our key observation: In Git, the commits are permanent (mostly) and unchangeable (entirely), but the branch names move. Git stores the graph—these chains of commits with their internal arrows connecting them, in this backwards-looking fashion—for us. We can add to it any time we want, by adding more commits. And, Git stores a name-to-hash-ID mapping table for us, with branch names holding the hash ID of starting points (or ending points?) in the graph.

The Git terminology for those starting / ending points is tip commit. The branch name identifies the tip commit.

HEAD, and git checkout and the index and the work-tree

Now that we have more than one branch name in our repository, we need some way to remember which branch we're using. This is the main function of the special name HEAD. In Git, we use git checkout to select some existing branch name, such as master or dev:

$ git checkout dev

results in:

A--B--D--E   <-- master
    \
     C   <-- dev (HEAD)

By attaching the name HEAD to a branch name like dev, Git knows which branch we're working on now.

As a crucial side effect, Git also:

  • copies all the files from C into the index, ready for the next commit, and
  • copies all the files from C/the-index into the work-tree, so we can see and use them.

Git may also need to remove some files, if we were on commit E and it has files that aren't there in C. It will remove them from both the index and the work-tree. As usual, Git makes sure that all three copies of every file match up. If there is a file named README in commit C, for instance, we have:

  • HEAD:README: this is the frozen Git-ified copy in commit C, now accessible under the special name HEAD.
  • :README: this is the index copy. It matches the HEAD:README at the moment, but we can overwrite it with git add.
  • README: this is a regular file. We can work with it. Git doesn't really care very much about that—we'll need to copy it back into :README if we change it!

So, with one action—git checkout master or git checkout dev—we:

  • re-attach HEAD;
  • fill the index; and
  • fill the work-tree

and are now ready to work, git add files to copy them back into the index, and git commit to make a new snapshot that adds to the branch and makes the branch name refer to the new commit. Let's make a new commit F on dev:

... edit some file(s) including README ...
git add README                    # or git add ., or git add -u, etc
git commit -m "another terrible log message"

and now we'll have:

A--B--D--E   <-- master
    \
     C--F   <-- dev (HEAD)

Git knows to update dev, not master, because HEAD is attached to dev, not master. Note, too, that since we made commit F from whatever is in our index right now, and we just made the index match the work-tree, now F, the index, and the work-tree all match up. That's just what we'd have if we just now ran git checkout dev!

This is where git reset comes in

Except for the special case of an unreachable commit that eventually gets deleted, the graph itself can only be added-to. The branch names, however, we can move around any time we like. The main command for doing this is git reset.

Suppose, for instance, that commit F is awful—it's a mistake, we just want to forget it entirely. What we need to do is move the name dev so that instead of pointing to F, it points to C again—F's parent.

We can find the hash ID of commit C and, rudely, just write that directly into the branch name. But if we do that, what about our index and work-tree? They'll still match the contents of commit F. We'll have the graph:

A--B--D--E   <-- master
    \
     C   <-- dev (HEAD)
      \
       F

but the index and work-tree won't match C. If we run git commit again we'll get a commit that looks almost exactly the same as F—it will share the tree, and just have a different date stamp and maybe a better log message. But maybe that's what we want! Maybe we wanted to just fix our terrible log message. In that case, making a new G from the current index would be the answer.

That's what git reset --soft does: it lets us move the branch name to point to a different commit, without changing the index and work-tree. We discard F, then make a new G that's just like F but has the right message. F has no name and eventually withers away.

But what if we just wanted to get rid of F entirely? Then we'd want the index and work-tree to match commit C. We'll let F wither away as before. But to get the index and work-tree to match C, we need git reset --hard.

Because the index and work-tree are separate entities, we can choose to go halfway. We can move the name dev to point to C, replace the index contents with those from C, but leave the work-tree alone. That's what git reset --mixed does, and git reset --mixed is actually the default for git reset so we don't even need the --mixed part.

All three of these actions have different end-goals: git reset --soft was for re-do the commit, git reset --hard was for throw away the commit entirely, and git reset --mixed doesn't have a clear usage in this particular example. So why are they all spelled git reset? That's where your rant applies again: they probably shouldn't be. They're related in that Git has these three things it can do with branch-name-to-commit-hash, and index and work-tree contents:

  1. move the branch name
  2. replace or keep the index contents
  3. replace or keep the work-tree contents

and git reset will either do step 1 and stop (git reset --soft), or do steps 1 and 2 and stop (git reset --mixed / the default), or do all three and stop (git reset --hard). But their purposes aren't related: Git is confusing mechanism ("how we get from here to there") with goal ("get to there").

Conclusion

Say the history of my commits is A - B - C and I have only this branch.

OK:

A--B--C   <-- branch (HEAD)

I need to go back to B, but I also want to retain the code I wrote in C

OK. Clearly what we'll want is one name identifying commit B and another one identifying commit C. But we also need to concern ourselves with the index and work-tree!

There's only one index and one work-tree,3 and those aren't copied by git clone. Only the commits are permanent. So if you have anything unsaved in your index and/or your work-tree, you probably should save it now. (By committing, probably—and you can use git stash to make commits that aren't on any branch, but let's not go there, at least not yet.) Let's assume you don't, so as to remove the question entirely.

The graph won't change. You just need to add a new name. There are lots of ways to do that, but for illustration, let's do it this way: let's start by creating a new branch name that also points to commit C, which we'll call save. To do that, we'll use git branch, which can create new names pointing to existing commits:

$ git branch save

The default for where the new name points is to use the current commit (via HEAD and the current branch name), so now we have:

A--B--C   <-- branch (HEAD), save

HEAD has not moved: it's still attached to branch, which still points to C. Note that both branches identify the same commit C, and all three commits are on both branches.4

Now that we have the name save saving the hash ID of C, we're free to move the name branch to point to commit B. To do that, we'll use git reset. We'd like to have our index and work-tree match commit B too, so we want git reset --hard—which will replace our index and work-tree, which is why it was important to make sure we didn't need to save anything from them:

$ git reset --hard <hash-of-B>

giving:

A--B   <-- branch (HEAD)
    \
     C   <-- save

There are, of course, a lot of other options. For instance, we could leave branch pointing to C and create a new name pointing to B:

A--B   <-- start-over
    \
     C   <-- branch (HEAD)

and to do that we could use:

$ git branch start-over <hash-of-B>

Since we didn't move HEAD, there is no need to disturb the index and work-tree in any way. If we had uncommitted work we could now run git add if needed (to update the index if needed) and git commit to make a new commit D that would have C as its parent.


3This is actually not true. There's one main work-tree and it has one main index. You can create as many temporary index files as you like, and since Git 2.5, you can add auxiliary work-trees whenever you like. Each added work-tree has its own separate index—the index indexes / caches the work-tree, after all—and its own HEAD so that each can, and in fact must, be on a different branch. But again, that's not something you need to worry about yet. Creating a temporary index is really just for special-purpose actions: for instance, that's how git stash commits your current work-tree without messing with other things.

4This is where Git and Mercurial differ enormously: in Mercurial, every commit is on exactly one branch, where it remains forever. You literally can't make two branch names that identify the same commit. Mercurial also doesn't use this branch name equals tip commit and other commits are implied by walking the graph trick.


There's a trick for hash IDs

I'm just going to mention this in passing here. Above, we had a lot of cases where you probably have to run git log and cut-and-paste big ugly hash IDs. We already know that a name, like a branch name, lets us use the name instead of the ID. Instead of writing out the hash ID of C as pointed-to by branch or by save, we can just use the name:

git show save

for instance will extract commit C, then extract commit B, compare the two, and show us what's different in the snapshots in B and C. But we can go one better:

git show save~1

means: Find commit C. Then, step back one parent link. That's commit B. So git show will now extract the snapshots in B and its parent A, compare the two, and show us what we changed in B. The tilde ~ and hat ^ characters can be used as suffixes on any revision specifier. The complete description of how to specify revisions (commits or commit ranges, mostly) is documented in the gitrevisions manual. There are a lot of ways to do it!


Some years ago, I tried my hand at starting a book that would use both Git and Mercurial as a way to get people started with graph-and-hash-based distributed source code control. Unfortunately most of the work on that happened between jobs, and I haven't been between-jobs for years now, so it's stalled and getting stale. But for those who want to see what's there, it's here.

这篇关于创建分支与执行软重置之间的区别?返回到旧的工作版本的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆