详细的原因为什么远程git rebase是如此邪恶 [英] Detailed reason why remote git rebase is so evil

查看:108
本文介绍了详细的原因为什么远程git rebase是如此邪恶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我来自一个集中的VCS背景,并试图确定我们在Git(新公司,年轻代码库)中的工作流程。有一个问题我找不到一个简单而又详细的答案,就是远程分支上的rebase究竟是干什么的。我知道它重写历史记录,并且一般应该仅限于本地分支机构。



我目前正在尝试审核的工作流涉及远程协作分支,每个dev拥有一个用于共享代码的目的。 (在可预见的未来,拥有2个开发人员和最多3个每个项目和功能请求的功能分支似乎是过度的,而且比开销所带来的花销更多。)然后我遇到了这个答案并尝试了它,并且完成了我想要的 - 开发人员经常向他自己的协作分支提交并推送,当他知道什么是批准发布到舞台时,他可以远程重新组合(压扁或者重组),然后融入发展。

输入原始问题 - 如果远程分支是为了合作目的,其他人必然迟早会提取。如果这是一个流程/培训问题,因为没有'guest开发者'提交给该协作分支,那么分支所有者实际上发生了什么事情来改变这个远程分支?

解决方案

这不是真正的恶魔,它是实现和期望的问题。



我们从一团乱事实:


  • 每个Git散列表示一些独特的对象。为了我们的目的,我们只需要考虑 commit 对象。每个散列是将加密散列函数(对于Git,特别是SHA-1)应用到对象的内容的结果。对于提交,其内容包括源代码树的ID;作者和提交者的姓名和电子邮件地址以及时间/日期戳;提交消息;并且在这里最关键的是父母提交的ID。


  • 即使只改变内容中的一个位,也会产生新的,非常不同的哈希ID。哈希函数的加密属性(用于验证和验证每个提交(或其他对象))也意味着无法让某些不同的对象具有相同的哈希ID。 Git指望在这个库之间传递对象。

  • 通过将提交复制到新提交,Rebase可以工作(必然)。即使没有其他更改 - 通常,与新副本相关联的源代码与原始源代码不同 - 整个资源库的整个是重新保留某个提交链。例如,我们可以从以下开始:

      ...-- o  -  *  -  o  -  o  -  o <  -  develop 
    \
    o - o< - 功能



    特性从分支开发在提交 * ,但现在我们希望特性开发的提示提交下降,所以我们重新设置它。结果是:

      ...  -  o  -  *  -  o  -  o  -  o < - 开发
    \\
    \ @ - @< - 功能
    \
    o - o弃用[曾经是功能,现在剩下的功能]

    其中 @ s是原始的两个提交。 只是指向(单一)提交的指针。我们倾向于认为是分支,就像两个提交 @ - @ 一样,是通过从每次提交到它的父级向后工作来形成的。 / p>

  • 分支预计会增加新的提交。发现 develop master 添加了一些新的提交是非常正常的,因此名称现在指向一个提交 - 或许多提交中的最后一个 - 指向名称​​用来指向的位置。


  • >您的 Git将您的存储库与其他一些Git及其他存储库进行同步(无论何种程度),您的Git和它们的Git会交换ID--特别是哈希ID。确切地说,哪些ID取决于传输方向,以及要求Git使用的任何分支名称

  • >远程追踪分支实际上是 Git存储的实体,与 存储库相关联。您的远程跟踪分支 origin / master 实际上是您的Git的地方,用于记住Git位于原点他说 master 是我们最后一次谈论的内容。
    $ b

    所以,现在我们拿这七个项目,看看 git fetch 是如何工作的。例如,您可以运行 git fetch origin 。此时,你的Git调用 origin 上的Git并询问它的分支。他们说诸如 master = 1234567 branch = 89abcde 之类的东西(尽管哈希值全部长度正好为40个字符,比这7个字符)。



    你的Git可能已经有了这些提交对象。如果是这样,我们快完成了!如果没有,它会要求他们的Git发送这些提交对象,以及Git需要的其他对象来理解它们。额外的对象是任何与这些提交一起提交的文件,以及那些提交的那些提交,以及父母的父母,等等,直到我们找到一些提交对象你有。这将为您提供所有需要的提交和文件以用于任何和所有新的历史记录。



    一旦您的Git安全地存储了所有对象,您的Git会使用新ID更新您的远程跟踪分支。他们的Git刚刚告诉你他们的 master 1234567 ,所以现在你的 origin / master 设置为 1234567 。对于它们分支也是如此:它会变成你的 origin / branch ,你的Git会保存如果你现在 git checkout branch ,你的Git使用<$> $ <$ $ $ $ $> c $ c> origin / branch 来创建一个新的本地标签,指向 89abcde 。让我们来画一下:

      ...-- o  -  *  -  o  -  1 < -  master,origin / master 
    \
    o - 8 < - 分支,原产地/分支

    (我已将 1234567 简化为 1 , 89abcde 只是 8 ,以使它们更好地适应。)为了让事情真正成为现实有趣的是,让我们在分支上进行我们的 own 新提交。假设它编号为 aaaaaaa ...

      ... -   -   -   -   -   -  1  -   - 主,起源/主
    \
    o - 8 < - 原产地/分支
    \
    A < - 分行

    (我缩短了 aaaaaaa ... 只是 A )。



    有趣的问题是如果他们 - 你从哪个Git获取 - 重组某些东西。假设,例如,他们将分支转变为 master 。这复制了一些数量的提交。现在你运行 git fetch ,你的Git看到他们说 branch = fedcba9 。你的Git会检查你是否有这个对象;如果没有,你得到它(及其文件)及其父文件(以及该提交文件)等等,直到我们达到某个共同点 - 事实上,这将是提交 1234567

    现在有这个:

      \\\ 
    \ o - F < - 原产地/分行
    \
    o - 8 - A< - 分行

    在这里,我已经为commit fedcba9 写了 F ,一个 origin /分支现在指向。



    如果您稍后发现这个问题,却没有意识到上游人员将他们的分支(你的 origin / branch ),你可以看看这个,并认为必须写成全部三个 o - 8 - A 链中提交,因为它们位于分支中,并且而不是 origin / branch 了。但是他们不在 origin / branch 上的原因是上游放弃了他们而转向了新的副本。很难说这些新副本实际上是副本,而且你也应该放弃这些提交。






    1 如果分支以正常,预期方式增长,那么Git和他们的Git很容易确定你的Git需要哪些提交: origin / master 告诉你上次看到 master 的地方,现在它们的 master 指向更长的链条。您需要的提交符合 原产地/主人



    如果分行以不太典型的方式进行洗牌,这有点难度。在最一般的情况下,他们只需用哈希ID枚举它们的所有对象,直到你的Git告诉它们他们已经达到了你已有的对象。具体细节会因浅层克隆而变得更加复杂。






    这不是不可能的



    这不是不可能的,因为Git版本2.0左右,现在有一些内置的工具可以让Git为你弄明白。 (特别是, git merge-base -fork-point ,它由 git rebase --fork-point ,使用你的reflog为 origin / branch 来确定 o - 8 链曾经在 origin / branch ,这只适用于保留那些reflog条目的时间段,但是这个默认时间至少为30天,让你有一个月的时间来跟上。在你的

    这真正归结为如果你和你的上游事先同意某些特定的分支集合,得到重新发布后,您可以安排每次执行此操作时在 存储库中执行所需的任何操作。然而,对于一个更典型的开发过程,你不会指望它们发生变化,如果它们不是 - 如果它们从不放弃你发布的已发布的提交 - 那么就没有什么需要从中恢复


    So I come from a centralized VCS background and am trying to nail down our workflow in Git (new company, young code base). One question I can't find a simple yet detailed answer to is what exactly does rebase on a remote branch do. I understand it rewrites the history, and in general should be limited to local branches only.

    The workflow I'm currently trying to vet out involves a remote collaboration branch, each dev "owning" one for the purpose of sharing code. (Having 2 developers and max 3 in the foreseeable future a feature branch for each project & feature request seems excessive and more overhead than benefit gained.)

    Then I came across this answer and tried it and it accomplished what I'd like - a dev commits and pushes often to his own collab branch, when he knows what is approved to be released to staging he can rebase remotely (to squash and perhaps reorganize) before merging into develop.

    Enter the original question - if the remote branch is for the purpose of collaboration someone else is bound to pull it sooner or later. If it is a process/training issue to not have the 'guest developer' commit to that collab branch, what actually happens with the branch owner rebases that remote branch?

    解决方案

    It's not really evil, it's a matter of implementations and expectations.

    We start with a tangle of facts:

    • Every Git hash represents some unique object. For our purposes here we need only consider commit objects. Each hash is the result of applying a cryptographic hash function (for Git, specifically, it's SHA-1) to the contents of the object. For a commit, the contents include the ID of the source tree; the name and email address and time/date-stamp of the author and committer; the commit message; and most crucially here, the ID of the parent commit.

    • Changing even just a single bit in the content results in a new, very-different hash ID. The cryptographic properties of the hash function, which serve to authenticate and verify each commit (or other object), also mean that there is no way to have some different object have the same hash ID. Git counts on this for transferring objects between repositories, too.

    • Rebase works (necessarily) by copying commits to new commits. Even if nothing else changes—and usually, the source code associated with the new copies differs from the original source code—the whole point of the rebase is to re-parent some commit chain. For instance, we might start with:

      ...--o--*--o--o--o   <-- develop
               \
                o--o       <-- feature
      

      where branch feature separates from branch develop at commit *, but now we would like feature to descend from the tip commit of develop, so we rebase it. The result is:

      ...--o--*--o--o--o        <-- develop
               \        \
                \        @--@   <-- feature
                 \
                  o--o          abandoned [used to be feature, now left-overs]
      

      where the two @s are copies of the original two commits.

    • Branch names, like develop, are just pointers pointing to a (single) commit. The things we tend think of as "a branch", like the two commits @--@, are formed by working backwards from each commit to its parent(s).

    • Branches are always expected to grow new commits. It's perfectly normal to find that develop or master has some new commits added on, so that the name now points to a commit—or the last of many commits—that points back to where the name used to point.

    • Whenever you get your Git to synchronize (to whatever degree) your repository with some other Git and its other repository, your Git and their Git have an exchange of IDs—specifically, hash IDs. Exactly which IDs depends on the direction of the transfer, and any branch names you ask your Git to use.

    • A remote-tracking branch is actually an entity that your Git stores, associated with your repository. Your remote-tracking branch origin/master is, in effect, your Git's place to remember "what the Git at origin said his master was, the last time we talked."

    So, now we take these seven items, and look at how git fetch works. You might run git fetch origin, for instance. At this point, your Git calls up the Git on origin and asks it about its branches. They say things like master = 1234567 and branch = 89abcde (though the hash values are all exactly 40 characters long, rather than these 7-character ones).

    Your Git may already have these commit objects. If so, we are nearly done! If not, it asks their Git to send those commit objects, and also any additional objects your Git needs to make sense of them. The additional objects are any files that go with those commits, and any parent commit(s) those commits use that you do not already have, plus the parents' parents, and so on, until we get to some commit object(s) that you do have. This gets you all the commits and files you need for any and all new history.1

    Once your Git has all the objects safely stored away, your Git then updates your remote-tracking branches with the new IDs. Their Git just told you that their master is 1234567, so now your origin/master is set to 1234567. The same goes for their branch: it becomes your origin/branch and your Git saves the 89abcde hash.

    If you now git checkout branch, your Git uses origin/branch to make a new local label, pointing to 89abcde. Let's draw this:

    ...--o--*--o--1   <-- master, origin/master
             \
              o--8    <-- branch, origin/branch
    

    (I've shortened 1234567 to just 1 here, and 89abcde to just 8, to get them to fit better.)

    To make things really interesting, let's make our own new commit on branch, too. Let's say it gets numbered aaaaaaa...:

    ...--o--*--o--1    <-- master, origin/master
             \
              o--8     <-- origin/branch
                  \
                   A   <-- branch
    

    (I shortened aaaaaaa... to just A).

    The interesting question, then, is what happens if they—the Git from which you fetch—rebase something. Suppose, for instance, that they rebase branch onto master. This copies some number of commits. Now you run git fetch and your Git sees that they say branch = fedcba9. Your Git checks to see if you have this object; if not, you get it (and its files) and its parent (and that commit's files) and so on until we reach some common point—which will, in fact, be commit 1234567.

    Now you have this:

    ...--o--*--o--1        <-- master, origin/master
             \     \
              \     o--F   <-- origin/branch
               \
                o--8--A    <-- branch
    

    Here I've written F for commit fedcba9, the one origin/branch now points-to.

    If you come across this later without realizing that the upstream guys rebased their branch (your origin/branch), you might look at this and think that you must have written all three commits in the o--8--A chain, because they're on your branch and not on origin/branch anymore. But the reason they're not on origin/branch is that the upstream abandoned them in favor of the new copies. It's a bit hard to tell that those new copies are, in fact, copies, and that you, too, should abandon those commits.


    1If branches grow in the "normal", "expected" way, it's really easy for your Git and their Git to figure out which commits your Git needs from them: your origin/master tells you where you saw their master last time, and now their master points further down a longer chain. The commits you need are precisely those on their master that come after the tip of your origin/master.

    If branches are shuffled around in less-typical ways, it's somewhat harder. In the most general case, they simply have to enumerate all their objects by hash IDs, until your Git tells them that they have reached one you already have. The specific details get further complicated by shallow clones.


    It's not impossible

    It's not impossible to tell, and since Git version 2.0 or so, there are now built-in tools to let Git figure it out for you. (Specifically, git merge-base --fork-point, which is invoked by git rebase --fork-point, uses your reflog for origin/branch to figure out that the o--8 chain used to be on origin/branch at one point. This only works for the time-period that those reflog entries are retained, but this defaults to at least 30 days, giving you a month to catch up. That's 30 days in your time-line: 30 days from the time you run git fetch, regardless of how long ago the upstream did the rebase.)

    What this really boils down to is that if you and your upstream agree, in advance, that some particular set of branch(es) get rebased, you can arrange to do whatever is required in your repository every time they do this. With a more typical development process, though, you won't expect them to rebase, and if they don't—if they never "abandon" a published commit that you have fetched—then there's nothing you need to recover from.

    这篇关于详细的原因为什么远程git rebase是如此邪恶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆