如何编辑和更新不同git分支的文件? [英] how to edit and update files for different git branches?

查看:97
本文介绍了如何编辑和更新不同git分支的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的GitHub中的存储库有两个分支: master solution .首先,我 git clone

  git clone< master url> 

然后我 cd 到该文件夹​​并切换到 solution 分支

  git checkout解决方案 

我发现文件的内容仍然与 master 中的文件相同,例如 README.md .如何访问 solution 文件?

然后我尝试 git pull 更新 solution 分支中的文件

  git pull原点解决方案 

它可以工作,现在文件的内容用于 solution ,但是当我想切换回 master 时,它失败了,并说我需要合并,因为我认为某些文件在两个分支中具有不同的内容.如何切换回去?

通常,如何在不同分支中编辑和更新文件,以及如何轻松地来回切换?

另一个例子:

  I--J<-br1/...-- G--H<-主要\K--L<-br2\-\P 

是否需要另一个工作树?

解决方案

Git的新手通常认为Git将更改存储在分支中.这不是真的.但是,就您的情况而言,我认为您遇到的一个事实是,当您在Git存储库中工作时,您是在Git所谓的工作树中进行操作的.您在这里所做的任何事情都不在Git中(尚未).

您可能想使用 git worktree add 处理您的特定情况.在介绍了Git如何处理所有这些内容之后,我们将开始讨论这一问题,因为如果没有很多基础知识,这将毫无意义.

我要解释的方式是Git根本不存储更改,并且实际上不关心分支.Git存储和关心的是 commits .这意味着您需要知道什么是提交并为您做什么,如何查找一个提交,如何使用一个现有的提交,以及如何进行一个新的提交.

什么是提交

使用Git时,您将使用的基本实体是 commit .关于提交,您需要了解三件事.您只需记住它们是任意的:没有特别的理由必须这样做,只是当Linus Torvalds编写Git时,这些就是他的决定.

  1. 每个提交均已编号.

    但是,数字并不是简单的计数数字:我们没有#1提交,其后是2、3、4等.相反,每个提交都有一个唯一的但非常大且丑陋的数字,以十六进制表示,介于1到很大之间. 1 每个每个中提交存储库获得一个唯一的,随机查找的数字.

    看起来是随机的,但不是.它实际上是内部对象内容的加密校验和.这种独特的编号方案使两个Git可以通过相互传递这些大数字来交换内容.

    此操作的主要副作用是更改提交中的内容实际上是不可能的.(这对Git的所有内部对象都是如此.)原因是散列ID(即Git 查找对象 内容的校验和).取出其中之一,对其内容进行更改,然后放回去,您将获得一个具有新的和不同的哈希ID的新提交(或新的其他内部对象).现有的ID仍在其中.这意味着,即使Git本身也不能更改已存储提交的内容.

  2. 每个提交都存储每个文件的完整快照.

    更准确地说,每个提交都存储Git在您或进行提交时知道的每个文件的完整副本.我们将深入了解这个了解"当我们研究如何进行 new 提交时.

    这些副本是只读的,压缩的,并以只有Git本身可以读取的格式存储.它们不仅在每次提交中而且在提交中都进行了去重复.也就是说,如果您的Git存储库中有某个 README 文件的特定副本或其他内容,并存储在某个提交中,而您曾经进行过一个 new 提交,且该提交具有文件的同一副本-即使使用其他名称-Git也会重复使用以前的副本.

  3. 而且,每个提交都存储一些元数据.

    带有提交的元数据包括进行提交的人的姓名和电子邮件地址.Git从您的 user.name user.email 设置中获取此信息,并只是相信您就是您声称的那个人.它们包括日期(何时您(或任何人)提交). 2 元数据还包括为什么您(或所有人)以 commit message 的形式进行提交.Git对邮件中的内容并没有特别严格的要求,但是它们通常看起来很像电子邮件,只包含一个简短的单行主题,然后是邮件正文.

    不过,此元数据的一部分仅用于Git本身.每个提交都在其元数据中存储上一个提交的提交编号. 3 这种形式的提交分为简单的向后看的链:

      ...< -F< -G< -H 

    在这里,每个大写字母代表一些实际的提交哈希ID.最新的Commit H 内部具有较早的Commit G 的实际哈希ID.当Git从Git保留所有提交的所有位置提取较早的提交 G 时,提交 G 的内部实际哈希值早于- G 提交 F .

    我们说,提交 H 指向提交 G ,它指向提交 F .依次提交 F 指向某个更早的提交,这又指向另一个更早的提交,依此类推.这一直一直到有史以来的第一次提交,不能指向后一个,因此它就不能.

在Git存储库中的该向后看的提交链 是该存储库中的历史记录.历史就是承诺;提交是历史;而Git则向后工作.我们从最新的开始,然后根据需要向后工作.


1 对于SHA-1,该数字介于1和1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,975之间.这是十六进制的 ffffffffffffffffffffffffffffffffffffffff ,即2 160 -1.对于SHA-256,它介于1和2 256 -1之间.(使用任何无限精度计算器(例如 bc dc 来计算2 256 .它非常大.零被保留为int中的空哈希)两种情况.

2 实际上,有两个 用户电子邮件时间三元组,一个称为作者"三元组.一个称为"committer".作者是自己编写提交的人,而在Git用来开发Linux的早期,提交人就是通过电子邮件收到补丁并将其放入的人.这就是为什么 提交消息的格式就好像它们是电子邮件一样:通常,它们电子邮件.

3 大多数提交仅具有一个先前的提交.至少一个提交(即第一次提交)之前没有 no 提交;Git将此称为 root commit .一些提交指向的是两个而不是一个早期的提交:Git称它们为合并提交.(允许合并提交指向两个以上的先前提交:具有三个或更多父级的提交称为章鱼合并.它们不会执行您无法进行多次普通合并的任何操作.,但是如果您将多个主题联系在一起,他们可以以一种简洁的方式做到这一点.)


分支名称是我们查找提交的方式

Git始终可以通过其丑陋的哈希ID找到任何提交.但是这些哈希ID大而丑陋.你能记住你所有的吗?(我不记得我的了.)幸运的是,我们不需要记住所有它们.请注意,上面我们是如何以 H 开头并从那里向后工作的.

因此,如果提交位于指向后的链中(确实如此),并且我们需要从某个链中的最新提交开始,我们如何找到的哈希ID最后提交?我们可以写下来:将其记在纸上,白板上或其他东西上.然后,每当我们进行一次 new 提交时,我们都可以擦除旧的(或将其划掉)并写下新的最新提交.但是,为什么我们要为此烦恼呢?我们有一台计算机:为什么我们没有记住最新的提交?

这正是分支名称的作用.它只保存链中 last 提交的哈希ID:

  ...-- F--G--H<-主 

名称 master 包含最后一次提交 H 的实际哈希ID.和以前一样,我们说名称 master 指向此提交.

假设我们现在要创建第二个分支.让我们命名为 develop feature topic 或我们喜欢的其他名称,该也指向提交 H :

  ...-- F--G--H<-主机,解决方案 

两个名称都标识相同的最后提交",因此所有通过 H 提交的提交现在都在两个分支上.

不过,分支名称的特殊之处在于,我们可以使用 git switch 切换到该分支,或者在Git之前的Git 2.23中使用git checkout .我们说 git checkout master ,我们得到提交 H 并打开". master .我们说 git switch solution ,我们也得到commit H ,但是这次我们打开"了.解决方案.

要告诉我们用来查找提交 H 名称,Git将特殊名称 HEAD 附加到一个(并且只有一个)分支名称:

  ...-- F--G--H<-主解决方案(HEAD) 

如果我们现在进行一次 new 提交-我们将很快查看如何的操作-Git通过使用commit <来写出新的提交 H 作为其父级,以便新的提交指向 H .我们将其称为新的提交 I ,尽管其实际数量只是其他一些看起来很随机的哈希ID.我们无法预测哈希值ID,因为它取决于我们准确生成哈希值的秒数(因为有时间戳记);我们只知道它将是唯一的. 4

让我们绘制新的提交链,包括Git使用的狡猾技巧:

  ...-- F--G--H<-主\我<-解决方案(HEAD) 

进行了新的提交 I ,Git将新提交的哈希ID写入了当前分支名称 solution 中.因此,现在名称 解决方案标识提交 I .

如果我们切换回 name master ,我们将看到所有文件,它们都位于commit H 中.再次切换回 solution ,我们将看到与提交 I 相同的文件.或者,也就是说,我们可能以这种方式看到它们.但是我们可能不会!


4 鸽子洞原理告诉我们,这最终将失败.很大的 size 哈希ID告诉我们,失败的机会很小,实际上,它永远不会发生.生日问题要求哈希值非常大,并且副本.您可以与他们一起做任何您想做的事.这些都不会触及提交中的任何原始文件.

正如我在顶部提到的,这些文件的工作树副本不在Git中.他们在您的工作区中.它们是您的文件,而不是Git的文件.您可以做任何您想做的事情或与他们一起做.当您告诉Git进行操作时,Git只是从一些现有的提交中填充了它们.在那之后,它们全归您所有.

但是,在某个时候,您可能希望Git进行一次 new 提交,而当这样做时,您希望它从以下位置更新 个文件:您的文件.如果Git只是重新保存了它自己的所有文件,那将毫无用处.

在其他非Git版本控制系统中,这通常非常容易.您只需在Mercurial中运行例如 hg commit ,Mercurial就会读回您的工作树文件,将其压缩为自己的内部形式 5 并进行提交.当然,这需要已知文件列表(例如, hg add 更新列表).但是Git不会这样做:那太容易了,和/或太慢了.

相反,Git所做的是与工作树中的提交分开保留自己的额外副本".每个文件.该文件位于冻干"文件中.(压缩和重复数据删除)格式,但实际上并没有像冻结中那样提交.实际上,这第三份副本"是指每个文件中的 位于提交和工作树之间. 6

每个文件的额外副本都存在于Git所谓的 index staging区域中,或者(通常是在最近的日子中)缓存.这三个名称都描述了同一件事.(它主要实现为名为 .git/index 的文件,不同之处在于该文件可以包含将Git重定向到其他文件的指令,并且您可以让Git与其他索引文件一起操作.)

因此,当您切换到某些特定的提交时,Git会执行以下操作:

当您运行 git commit 时,Git要做的是:

因此,运行 git commit 时索引(aka临时区域)中的内容就是要提交的内容.这意味着,如果您在工作树中更改了内容(无论是要修改文件,添加新文件,完全删除文件还是进行其他操作),则需要复制已更新的文件返回到Git的索引(或者,如果要删除文件,则从Git的索引中完全删除文件).通常,用于执行此操作的命令是 git add .该命令采用一些文件名,并使用该文件或这些文件的工作树副本来替换该文件或这些文件的索引副本.如果文件从工作树中丢失(因为已将其删除),则 git add 也会通过从该目录中删除文件来更新Git的索引.

换句话说, git add 表示使该文件的索引副本/这些文件与工作树副本匹配.只有当文件是全新文件(在您运行 git add 时在索引中不存在)时,该文件才真正被添加到索引中. 7对于大多数文件,实际上只是替换现有副本.

文件的索引副本是按顺序排序的:它存储在所有内部对象的大型数据库中.但是,如果以前从未对文件的索引副本进行过提交,则该文件处于不稳定状态.直到您运行 git commit ,并且Git打包索引中的所有内容并将其转换为新的提交,它才安全地 committed 到Git,并且不能删除或销毁. 8


5 Mercurial使用非常不同的存储方案,该方案通常存储差异,但偶尔存储快照.这基本上无关紧要,但是Git提供并记录了可以直接进入其内部存储格式的工具,因此有时了解Git的内部存储格式可能很重要.

6 由于始终进行重复数据删除,因此此副本"的文件最初不占用空间.更准确地说,它不占用其内容空间.它在Git的索引文件中占用了一定数量的空间,但这相对较小:通常每个文件只有几十个或几百个字节.索引仅包含文件名,某些模式和其他缓存信息以及内部Git对象哈希ID.实际的内容作为内部的 blob对象存储在Git对象数据库中,这就是Git进行重复数据删除的方式.

7 也许 git add 应该被称为 git update-index git update-staging-area ,但已经有一个 git update-index .update-index命令需要了解Git如何将文件存储为内部Blob对象:它不是非常用户友好,并且实际上并非旨在成为您自己会使用的东西.

8 已提交的文件在Git中作为一个永久性和完全只读的实体存在-但它的 permanence (在此前缀为大多数)是断言的保持 commit 的持久性. 有可能完全删除提交.如果您从未向任何其他Git发送任何特定的提交,那么从您自己的Git存储库中删除该提交将使它真正消失(尽管不是马上).完全删除提交的最大问题是,如果您已经将其发送到其他Git,则其他Git可能会在以后再次将其返回给您:提交就是这种病毒.当两个Git相互之间有Git-sex时,其中一个很可能会捕获提交.


摘要

因此,现在我们知道什么是提交:带编号的对象,该对象具有两个部分,即数据(快照)和元数据(信息),它们通过元数据向后串在一起.现在我们也知道什么是分支名称:它们存储某个链中应该称为 last 的提交的哈希ID(即使后面有更多提交).我们知道任何提交中的任何内容都无法更改,但是我们总是可以添加 new 提交.要添加新的提交,我们:

如果我们进行一系列这样的提交:

  ...-- G--H<-main,br1,br2 

并将 HEAD 附加到 br1 上,然后进行两次新提交:

  I--J<-br1(HEAD)/...-- G--H<-main,br2 

如果现在将 HEAD 附加到 br2 并进行两次新提交,我们将得到:

  I--J<-br1/...-- G--H<-主要\K--L<-br2(头) 

请注意,在每个步骤中,我们仅将一个提交添加到了存储库中所有提交的集合中.现在,名称 br1 标识了链上的最后一次提交;名称 br2 标识其链上的最后一次提交;名称 main 标识该链上的最后一次提交.提交 H 和更早版本位于所有三个分支上. 9

在任何时候,只有一个当前提交.它由 HEAD 标识: HEAD 附加到您的分支名称之一.当前提交的文件通过Git的索引复制到您的工作树中,并且也只有一个工作树和一个索引.如果要切换到其他分支名称,并且该其他分支名称反映其他提交,则还必须在Git的索引和工作树之间进行切换. 10


9 其他版本控制系统也处于其他位置.例如,在Mercurial中,仅在 one 分支上进行提交.这需要不同的内部结构.

10 这不是完全正确,但是细节变得复杂.请参见在当前分支上有未提交的更改时签出另一个分支.


git工作树添加

现在,我们知道如何使用我们的一个工作树,Git的一个索引和一个单独的 HEAD ,我们可以看到从一个分支切换到另一个分支会很痛苦:全部每次切换时,我们的工作树文件都会更新(无论如何,脚注10中提到的复杂情况除外).

如果需要在两个不同的分支中工作,则有一个简单的解决方案:制作两个单独的克隆.每个克隆都有自己的分支,自己的索引和自己的工作树.但这有一个很大的缺点:这意味着您有两个完整的存储库.它们可能会占用很多额外的空间. 11 而且,您可能不喜欢必须处理多个克隆以及所涉及的额外分支名称.相反,如果您可以共享基础克隆,但又有另一个工作树怎么办?

要使第二个工作树有用,此新工作树必须具有自己的索引自己的 HEAD .这就是 git worktree add 的作用:它在当前工作树之外的某个地方创建一个新的工作树 12 ,并为该新工作树提供自己的索引和 HEAD .添加的工作树必须位于未在主工作树中检出且不在其他任何添加的工作树中检出的某个分支上.

由于添加的工作树具有自己的单独的内容,因此您可以在其中进行工作而不会干扰您在主工作树中所做的工作.由于两个工作树共享一个基础的存储库,因此,只要您在一个工作树中进行新的提交,就可以立即在另一个工作树中看到它.因为进行提交更改分支名称中存储的哈希ID,所以添加的工作树不得使用与任何其他工作树相同的分支名称(否则,链接分支名称,当前提交哈希ID,工作树内容和索引内容之间被弄乱了)-但是添加的工作树始终可以使用分离式HEAD 模式(我们在这里没有描述)

总体而言, git worktree add 是处理您情况的一种非常不错的方法.如果要为此做很多工作,请确保您的Git版本至少为2.15. git worktree 命令是Git 2.5版中的新命令,但是有一个令人讨厌的错误,如果您的HEAD头已分离或使用起来很慢,它可能会咬您,并且您也可以在主菜单中进行任何工作工作树在Git 2.15版之前,此错误无法修复.


11 如果使用路径名创建 local 克隆,Git将尝试硬链接内部文件以节省大量空间.这个主要是解决了这个问题,但是有些人仍然不喜欢拥有两个单独的存储库,并且随着时间的流逝,空间的使用也会增加.也可以使用Git的 alternates 机制来解决这一问题.我相信GitHub例如使用它来使forks更好地为他们服务.但是总的来说, git worktree 填补了一个可察觉的空白.也许您会喜欢.

12 从技术上讲,添加的工作树不必在主工作树之外.但这并不是一个好主意:它只会变得令人困惑.将其放置在其他位置.通常,就在隔壁"是一个不错的计划:如果您的主要工作树位于 $ HOME/projects/proj123/中,则可以使用 $ HOME/projects/proj123-alt $ HOME/projects/proj123-branchX 或其他任何内容.

A repository in my GitHub has two branches: master and solution. First I git clone

git clone <master url>

then I cd to that folder and switch to solution branch

git checkout solution

I find the contents of files is still the same as in master, e.g. README.md. how can I access solution files?

then I tried git pull to update the files in solution branch

git pull origin solution

and it works and now the contents of files are for solution, but when I want to switch back to master, it failed and said I need to merge, because I think some files have different contents in the two branches. How to switch back?

In general, how to edit and update files in different branches and how to easily switch back and forth?

Another example:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2     
              \
               M--N
                   \
                    P

Is another worktree needed?

解决方案

Those new to Git often think that Git stores changes in branches. This is not true. In your case, though, I think what you are running into is the fact that when you do work in a Git repository, you do so in what Git calls your working tree. Anything you do here is not in Git (yet).

You might want to use git worktree add to deal with your particular situation. We'll get to that after covering how Git handles all of this, because it won't make any sense without a lot of basics.

The way I like to explain this is that Git does not store changes at all, and does not really care about branches. What Git stores, and cares about, are commits. This means that you need to know what a commit is and does for you, how you find a commit, how you use an existing commit, and how you make a new commit.

What commits are

The basic entity that you will use, as you do work using Git, is the commit. There are three things you need to know about a commit. You just have to memorize these as they are arbitrary: there's no particular reason they had to be done like this, it's just that when Linus Torvalds wrote Git, these are the decisions he made.

  1. Each commit is numbered.

    The numbers, however, are not simple counting numbers: we don't have commit #1 followed by commits 2, 3, 4, and so on. Instead, each commit gets a unique, but very big and ugly, number expressed in hexadecimal, that is between 1 and something very large.1 Every commit in every repository gets a unique, random-looking number.

    It looks random, but isn't. It's actually a cryptographic checksum of the internal object content. This peculiar numbering scheme enables two Gits to exchange content by handing each other these large numbers.

    A key side effect of this is that it's physically impossible to change what's in a commit. (This is true of all of Git's internal objects.) The reason is that the hash ID, which is how Git finds the object, is a checksum of the content. Take one of these out, make changes to its content, and put it back, and what you get is a new commit (or new other internal object), with a new and different hash ID. The existing one is still in there, under the existing ID. This means not even Git itself can change the content of a stored commit.

  2. Each commit stores a full snapshot of every file.

    More precisely, each commit stores a full copy of every file that Git knew about at the time you, or whoever, made the commit. We'll get into this "knew about" part in a bit, when we look at how to make a new commit.

    These copies are read-only, compressed, and stored in a format that only Git itself can read. They are also de-duplicated, not just within each commit, but across every commit. That is, if your Git repository had some particular copy of a README file or whatever, stored in some commit, and you ever make a new commit that has the same copy of the file—even under some other name—Git will just re-use the previous copy.

  3. And, each commit stores some metadata.

    The metadata with a commit include the name and email address of the person who made that commit. Git gets this from your user.name and user.email setting, and simply believes that you are whoever you claim to be. They include a date-and-time stamp of when you (or whoever) made the commit.2 The metadata also include why you (or whoever) made the commit, in the form of a commit message. Git isn't particularly strict about what goes into the message, but they should generally look a lot like email, with a short one-line subject, and then a message body.

    One part of this metadata, though, is strictly for Git itself. Each commit stores, in its metadata, the commit number of the previous commit.3 This forms commits into simple backwards-looking chains:

    ... <-F <-G <-H
    

    Here, each of the uppercase letters stands in for some actual commit hash ID. Commit H, the most recent one, has inside it the actual hash ID of earlier commit G. When Git extracts earlier commit G from wherever it is that Git keeps all the commits, commit G has inside it the actual hash ID of earlier-than-G commit F.

    We say that commit H points to commit G, which points to commit F. Commit F in turn points to some still-earlier commit, which points to another earlier commit, and so on. This works its way all the way back to the very first commit ever, which—being the first commit—can't point backwards, so it just doesn't.

This backwards-looking chain of commits in a Git repository is the history in that repository. History is commits; commits are history; and Git works backwards. We start with the most recent, and work backwards as needed.


1For SHA-1, the number is between 1 and 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,975. This is ffffffffffffffffffffffffffffffffffffffff in hexadecimal, or 2160-1. For SHA-256 it's between 1 and 2256-1. (Use any infinite-precision calculator such as bc or dc to compute 2256. It's very big. Zero is reserved as the null hash in both cases.)

2Actually, there are two user-email-time triples, one called "author" and one called "committer". The author is the person who wrote the commit itself, and–back in the early days of Git being used to develop Linux—the committer was the person who received the patch by email and put it in. That's why the commit messages are formatted as if they were email: often, they were email.

3Most commits have exactly one previous commit. At least one commit—the very first commit—has no previous commit; Git calls this a root commit. Some commits point back to two earlier commits, instead of just one: Git calls them merge commits. (Merge commits are allowed to point back to more than two earlier commits: a commit with three or more parents is called an octopus merge. They don't do anything you couldn't do with multiple ordinary merges, but if you're tying together multiple topics, they can do that in a sort of neat way.)


Branch names are how we find commits

Git can always find any commit by its big ugly hash ID. But these hash IDs are big, and ugly. Can you remember all of yours? (I can't remember mine.) Fortunately, we don't need to remember all of them. Notice how, above, we were able to start with H and work backwards from there.

So, if commits are in backwards-pointing chains—and they are—and we need to start from the newest commit in some chain, how do we find the hash ID of the last commit in the chain? We could write it down: jot it down on paper, or a whiteboard, or whatever. Then, whenever we make a new commit, we could erase the old one (or cross it off) and write down the new latest commit. But why would we bother with that? We have a computer: why don't we have it remember the latest commit?

This is exactly what a branch name is and does. It just holds the hash ID of the last commit in the chain:

...--F--G--H   <-- master

The name master holds the actual hash ID of the last commit H. As before, we say that the name master points to this commit.

Suppose we'd like to make a second branch now. Let's make a new name, develop or feature or topic or whatever we like, that also points to commit H:

...--F--G--H   <-- master, solution

Both names identify the same "last commit", so all the commits up through H are on both branches now.

The special feature of a branch name, though, is that we can switch to that branch, using git switch or, in Git predating Git 2.23, git checkout. We say git checkout master and we get commit H and are "on" master. We say git switch solution and we also get commit H, but this time we are "on" solution.

To tell which name we're using to find commit H, Git attaches the special name HEAD to one (and only one) branch name:

...--F--G--H   <-- master, solution (HEAD)

If we now make a new commit—we'll look at how we do that in a moment—Git makes the new commit by writing it out with commit H as its parent, so that the new commit points back to H. We'll call the new commit I, although its actual number will just be some other big random-looking hash ID. We can't predict the hash ID because it depends on the exact second at which we make it (because of the time stamps); we just know that it will be unique.4

Let's draw the new chain of commits, including the sneaky trick that Git uses:

...--F--G--H   <-- master
            \
             I   <-- solution (HEAD)

Having made new commit I, Git wrote the new commit's hash ID into the current branch name, solution. So now the name solution identifies commit I.

If we switch back to the name master, we'll see all the files as they were in commit H, and when we switch back to solution again, we'll see the files as they were in commit I. Or, that is, we might see them that way. But we might not!


4The pigeonhole principle tells us that this will eventually fail. The large size of hash IDs tells us that the chance of failure is minute, and in practice, it never occurs. The birthday problem requires that the hash be very large, and deliberate attacks have moved from a purely theoretical issue with SHA-1 to being something at least theoretically practical, which is why Git is moving to larger and more-secure hashes.


Making new commits

It's time now to look more closely at how we actually make new commit I above. Remember, we mentioned that the data in a commit—the files making up the snapshot—are completely read-only. The commit stores files in a special, compressed, read-only, Git-only format that only Git itself can read. This is quite useless for doing any actual work.

For this reason, Git must extract the files from the commit, into some sort of work area. Git calls this work area your working tree or work-tree. This concept is pretty simple and obvious. Git just takes the "freeze-dried" files from the commit, rehydrates or reconstitutes them, and now you have usable files. These usable, work-tree copies of the files are of course copies. You can do anything you want with them. None of that will ever touch any of the originals in the commit.

As I mentioned at the top of this, these work-tree copies of your files are not in Git. They are in your work area. They are your files, not Git's. You can do anything you want to or with them. Git merely filled them in from some existing commit, back when you told Git to do that. After that, they're all yours.

At some point, though, you would probably like Git to make a new commit, and when it does that, you'd like it to update its files from your files. If Git just re-saved all of its own files unchanged, that would be pretty useless.

In other, non-Git, version control systems, this is usually really easy. You just run, e.g., hg commit in Mercurial, and Mercurial reads your work-tree files back, compresses them into its own internal form,5 and makes the commit. This of course requires a list of known files (and, e.g., hg add updates the list). But Git doesn't do that: that's too easy, and/or maybe too slow.

What Git does instead is to keep, separately from the commits and from your work-tree, its own extra "copy" of each file. This file is in the "freeze-dried" (compressed and de-duplicated) format, but isn't actually frozen like the one in a commit. In effect, this third "copy" of each file sits between the commit and your work-tree.6

This extra copy of each file exists in what Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These three names all describe the same thing. (It's mainly implemented as a file named .git/index, except that this file can contain directives that redirect Git to other files, and you can have Git operate with other index files.)

So, what Git does when you switch to some particular commit is:

  • extract each file from that commit;
  • put the original data (and file name) into Git's index; and
  • extract the Git-formatted ("freeze-dried") file into your work-tree, where you can see and work on it.

When you run git commit, what Git does is:

  • package up the index's content, as of that moment, as the saved snapshot;
  • assemble and package up all the appropriate metadata to make the commit object—this includes making the new commit point back to the current commit, by using the current commit's hash ID as the new commit's parent;
  • write all of that out as a new commit; and
  • stuff the new commit's hash ID into the current branch name.

So, whatever is in the index (aka staging area) at the time you run git commit is what gets committed. This means that if you've changed stuff in your working tree—whether that's modifying some file, adding a new file, removing a file entirely, or whatever—you need to copy the updated file back into Git's index (or remove the file from Git's index entirely, if the idea is to remove the file). In general, the command you use to do this is git add. This command takes some file name(s) and uses your work-tree copy of that file, or those files, to replace the index copy of that file, or those files. If the file has gone missing from your work-tree (because you removed it), git add updates Git's index by removing the file from there, too.

In other words, git add means make the index copy of this file / these files match the work-tree copy. Only if the file is all-new—does not exist in the index at the time you run git add—is the file really added to the index.7 For most files, it's really just replace existing copy.

The index copy of a file is sort-of-in-Git: it's stored in the big database of all internal objects. But if the index copy of a file has never been committed before, it's in a precarious state. It's not until you run git commit, and Git packages up everything that's in the index and turns it into a new commit, that it's safely committed to Git and can't be removed or destroyed.8


5Mercurial uses a very different storage scheme, in which it often stores diffs, but occasionally stores snapshots. This is mostly irrelevant, but Git provides and documents tools that can reach directly into its internal storage format, so it can be important, at times, to know about Git's internal storage format.

6Because it's always de-duplicated, this "copy" of the file takes no space initially. More precisely, it takes no space for its content. It occupies some amount of space within Git's index file, but that's relatively small: just a few dozen or hundred bytes per file, typically. The index contains just the file's name, some mode and other cache information, and an internal Git object hash ID. The actual content is stored in the Git object database, as an internal blob object, which is how Git does the de-duplication.

7Perhaps git add should have been called git update-index or git update-staging-area, but there already is a git update-index. The update-index command requires knowing how Git stores files as internal blob objects: it's not very user-friendly, and in fact is not aimed at being something you would ever use yourself.

8A committed file exists in Git as a mostly-permanent and completely-read-only entity—but its permanence, the one prefixed with mostly here, is predicated on the commit's permanence. It is possible to drop commits entirely. If you've never sent some particular commit to any other Git, dropping the commit from your own Git repository will make it go away for real (though not right away). The big problem with dropping commits entirely is that if you have sent it to some other Git, that other Git may give it back to yours again later: commits are sort of viral that way. When two Gits have Git-sex with each other, one of them is likely to catch commits.


Summary

So, now we know what commits are: numbered objects with two parts, data (snapshot) and metadata (information) that are strung together, backwards, through their metadata. Now we know what branch names are too: they store the hash ID of a commit that we should call the last in some chain (even if there are more commits after it). We know that nothing inside any commit can ever be changed, but we can always add new commits. To add a new commit, we:

  • have Git extract an existing commit, usually by branch name;
  • muck with the files that are now in our work-tree;
  • use git add to update any files we want updated: this copies the updated content from our work-tree back into Git's index; and
  • use git commit to make a new commit, that updates the branch name.

If we take some series of commits like this:

...--G--H   <-- main, br1, br2

and attach HEAD to br1 and make two new commits we'll get:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- main, br2

If we now attach HEAD to br2 and make two new commits, we will get:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2 (HEAD)

Note that in each step, we have merely added a commit to the set of all commits in the repository. The name br1 now identifies the last commit on its chain; the name br2 identifies the last commit on its chain; and the name main identifies the last commit on that chain. Commits H and earlier are on all three branches.9

At all times, there is only one current commit. It is identified by HEAD: HEAD is attached to one of your branch names. The current commit's files get copied out to your work-tree, through Git's index, and there's only one work-tree and one index, too. If you want to switch to some other branch name, and that other branch name reflects some other commit, you will have to switch around Git's index and your work-tree as well.10


9Other version control systems take other positions. For instance, in Mercurial, a commit is only ever on one branch. This requires different internal structures.

10This isn't completely true, but the details get complicated. See Checkout another branch when there are uncommitted changes on the current branch.


git worktree add

Now that we know how to use our one work-tree, Git's one index, and the one single HEAD, we can see how it can be painful to switch around from branch to branch: all our work-tree files get updated each time we switch (except for the complicated situation mentioned in footnote 10, anyway).

If you need to work in two different branches, there's a simple solution: make two separate clones. Each clone has its own branches, its own index, and its own work-tree. But this has one big drawback: it means you have two entire repositories. They might use up a lot of extra space.11 And, you might not like having to deal with multiple clones and the extra branch names involved. What if, instead, you could share the underlying clone, but have another work-tree?

To make a second work-tree useful, this new work-tree has to have its own index and its own HEAD. And that's what git worktree add does: it makes a new work-tree, somewhere outside of the current work-tree,12 and gives that new work-tree its own index and HEAD. The added work-tree must be on some branch that is not checked out in the main work-tree, and is not checked out in any other added work-tree.

Because the added work-tree has its own separate things, you can do work in there without interfering with the work you're doing in the main work-tree. Because both work-trees share a single underlying repository, any time you make a new commit in one work-tree, it's immediately visible in the other one. Because making a commit changes the hash ID stored in a branch name, the added work-tree must not use the same branch name as any other work-tree (otherwise the linkage between branch name, current commit hash ID, work-tree content, and index content gets messed up)—but an added work-tree can always use detached HEAD mode (which we haven't described here).

Overall, git worktree add is a pretty nice way to deal with your situation. Be sure that your Git version is at least 2.15 if you're going to do a lot of work with this. The git worktree command was new in Git version 2.5, but has a nasty bug that can bite you if you have a detached HEAD or are slow about working in it, and you also do any work in the main work-tree; this bug is not fixed until Git version 2.15.


11If you make a local clone using path names, Git will try to hard-link internal files to save lots of space. This mostly solves this problem, but some people still won't like having two separate repositories, and over time the space usage will go up as well. There are tricks to handle that too, using Git's alternates mechanism. I believe GitHub, for instance, use this to make forks work better for them. But overall, git worktree fills a perceived gap; perhaps you'll like it.

12Technically, an added work-tree does not have to be outside the main work-tree. But it's a bad idea to put it inside: it just gets confusing. Place it somewhere else. Usually, "right next door" is a good plan: if your main work-tree is in $HOME/projects/proj123/, you might use $HOME/projects/proj123-alt or $HOME/projects/proj123-branchX or whatever.

这篇关于如何编辑和更新不同git分支的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆