将上游分支合并到具有重写历史记录的fork中 [英] Merge upstream branch into fork with rewritten history

查看:85
本文介绍了将上游分支合并到具有重写历史记录的fork中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有很好的方法来合并HEAD上具有相同文件/文件夹结构但历史记录不同的分叉存储库?不完全自动化的工作流程是不可接受的,因为它不会经常执行-但我希望有比手动复制所有文件并手动检查差异更好的方法.:)

Is there any good approach to merge forked repositories with same files/folders structure on the HEAD, but different history? Not fully automated workflow is acceptable, since it won't be done very often - but I hope there is better way than copying all files and checking differences manually.:)

背景是我们必须从已有10年历史的TFS存储库迁移到Git.要求保留整个历史记录,但仅限于主分支.从TFS迁移后-我们清理了一下Git存储库,但对于Git来说仍然太大了.

Background is that we had to migrate from 10-years-old TFS repository to Git. There was a requirement to keep whole history, but only for master branch. After migrating from TFS - we cleaned up Git repository a bit, but it's still too big for Git.

我们对其进行了迁移,并且此分支仍用于当前的生产部署.在该生产分支中还进行了一些修补程序-因此暂时不能放弃它,这些修补程序也很重要.

We migrated it and this branch is still being used for current production deployments. There are also fixes being done in that production branch - so it cannot be ditched for some time, and these fixes are also important to keep.

与此同时,我们正在一个单独的分支中进行主要的重构,该分支中仍使用许多当前的生产代码库,但另一方面,许多历史资料已被删除或移至不同的存储库中.

In parallel we are working on major refactorings in a separate branch, where a lot of current production codebase is still used, but on the other hand - a lot of historical stuff was removed or moved to different repositories.

我想做的是创建一个fork并重写历史记录(例如,使用 BFG Repo -Cleaner ),以清除所有已删除的项目/对象.

What I wanted to do is to make a fork and rewrite history (e.g. using BFG Repo-Cleaner), to cleanup all removed projects/objects.

此清理部分效果很好,但是我们还需要合并在当前 production 分支上完成的更改的可能性(仅一种方式-从 production cleaned- up 回购).我尝试通过从旧存储库添加上游分支来实现此目的,但是将上游存储库合并到具有重写历史记录的存储库中-使所有清理工作无用.它会重新添加所有已删除的对象.

This cleanup part worked well, however we also need possibility to merge changes done on current production branch (only one-way - from production to cleaned-up repo). I tried to do it with adding upstream branch from old repository, but merging upstream repository to repository with rewritten history - makes all cleanup useless. It re-adds all removed objects..

有什么办法可以解决?也许可以用完全不同的方式来进行清理? 有很多类似的问题,但没有找到我真正需要的.:)

Is there any way to solve it? Maybe such cleanup can be done in some completely different way? There are a lot of similar questions, but didn't find exactly what I need.:)

推荐答案

更新-阅读评论并查看我的答案后,有些事情可以澄清,需要进行一些调整以使其更容易正确使用,以及一两个完全错误.对于那个很抱歉;作为文档,最初的答案是草稿"质量.我将首先解决几个问题,但是我建议您也浏览一下下面的编辑答案.

Update - After reading comments and reviewing my answer, there are some things that can be clarified, some tweaks that will make it easier to use correctly, and one or two outright errors. Sorry about that; as docs go the original answer was "rough draft" quality. I'll address a couple questions first, but I do recommend having a look over the edited answer below as well.

上游配置-每个回购中分支之间的关系是发生此情况的关键.提取refspec将对此进行控制,只要正确设置了它们,就不需要其他上游"配置.

Upstream Configuration - The relationships between branches in each repo are key to what's going on here. The fetch refspecs are going to govern that, and as long as they're set correctly no other "upstream" configuration should be required.

也就是说,我下面要做的最大更改是将桥存储库中的清理后的分支移至其自己的clean/*名称空间,这样,对清理库进行正确的引用就很容易了. >更简单.

That said, the biggest change I make below is to move the cleaned-up branches in the bridge repository to their own clean/* namespace, so that fetching the right refs to the clean repository is much simpler.

BFG删除原始分支-这是正确的,但是在您配置网桥存储库的origin获取refspec之后,随后的fetch will recreate the original branches under the prod/*`名称空间.

BFG removing original branches - This is correct, but then after you configure the bridge repo's origin fetch refspec, the subsequent fetch will recreate the original branches under theprod/*` namespace.

关于您的最后评论-我认为您以前的尝试只是原始答案的粗糙草案"问题引起的错误的受害者.绝对有可能获得正确的结果,我想作为一个对这里的工具和技术完全满意的人,我会自动寻找可以使其正常工作的即时"更正.但是希望这种重写至少可以使您更接近要执行的操作...

As to your last comment - I think your previous attempts are just falling victim to errors that arise from the "rough draft" problems with the original answer. Getting the right result is absolutely possible, and I guess as someone fully comfortable with the tools and techniques here I'm automatically looking past the "on-the-fly" corrections that would make it work. But hopefully this rewrite will at least get you closer to what you're trying to do...

您提到您可能必须将变更 从生产存储库合并到已清理的存储库.这不是一个太严重的问题,但是要提防,如果您需要双向进行更改(即,如果您想通过清理后的存储库中的更改来更新生产分支),则会使情况变得复杂事情,可能会赞成采用另一种方法.

You mention that you may have to merge changes from the production repo to the cleaned-up repo. That's not too bad a problem, but do beware that if you need to have changes flow in both directions - i.e. if you could want to update the production branch with changes form the cleaned-up repo - that complicates things and might favor a different approach.

此外,如果所有更改都从生产仓库中的单个分支流入干净仓库中,这是最容易的. (是否在生产存储库中使用分支并不重要,但理想情况下,您希望它们全部合并到一个分支中,这将成为干净存储库中单个分支的更改源.)如果没有,可以应用相同的原理,但是执行起来比较困难.

Also, this is easiest if all changes flow from a single branch on the production repo into the clean repo. (It doesn't matter if you use branches within the production repo, but you'd ideally want them all to be merged into a single branch, which becomes the source of changes for a single branch in the clean repo.) If not, the same principles can apply, but the execution is harder.

请注意,任何方法都仅具有将生产中的补丁应用于清理后的代码库的能力.在某种程度上,清理仅包括删除某些文件,这没有问题.但是,如果存储库大相径庭,那么无论您采取什么措施,应用更改时的冲突都将成为一个日益严重的问题.

Note that any approach is only as good as the ability to apply patches from production onto the cleaned code base. To the extent that the cleaning only consists of removing certain files, that's no problem. But if the repos diverge wildly, then conflicts when applying the changes will become an ever-increasing problem regardless of anything you might try.

对于单向流程(产品仓库->清洁仓库),您可以保留一个具有原始"和清洁"历史记录的仓库.这可以是生产仓库本身,也可以是专用的桥梁存储库". (它不能是已清理的存储库,因为它将包含您要从中删除的大历史记录.)

For a one way flow (prod repo -> cleaned repo), you can keep one repo with both "original" and "cleaned" history. This can be the production repo itself, or a dedicated "bridge repository". (It cannot be the cleaned repository, as it would then contain the large history you're trying to remove from it.)

确切地如何从您所在的位置进入该状态,取决于您所在的位置的详细信息.出于说明目的,如果您首先考虑这种方法,则可能会这样:

Exactly how to get to that state from where you are, depends on details of where you are. For illustrative purposes, if you started with this approach in mind it might go like this:

您的产品仓库位于<prod-url>.您将其克隆,此克隆将用于创建网桥存储库.

You have your prod repo at <prod-url>. You clone it, and this clone will be used to make a bridge repository.

$ git clone `<prod-url>` bridge
$ cd bridge

您在bridge中运行BFG,然后对其进行克隆以创建真正的干净"存储库.然后(再次在bridge中)重新配置origin,以便其分支可以映射到bridge存储库中的prod命名空间.

You run BFG in bridge, and then clone that to create the true "clean" repository. Then (once again in bridge) you re-configure origin so that its branches can be mapped to a prod namespace in the bridge repo.

$ git config remote.origin.fetch refs/heads/*:refs/heads/prod/*

现在,当您从原点获取到网桥存储库时,git会尝试在prod/名称空间中推进一组分支,而不是更新远程跟踪引用.但是,您希望将这些prod/*分支取入您的干净存储库中;最简单的解决方法是将清理后的分支移至clean/命名空间,然后将清理后的存储库重新配置为仅获取clean/*分支.

Now, when you fetch from origin to the bridge repo, instead of updating remote tracking refs, git will try to advance a set of branches in the prod/ namespace. But you do not want those prod/* branches fetched into your clean repo(s); the easiest way to fix that is to move the cleaned-up branches to a clean/ namespace and reconfigure the clean repo to fetch only the clean/* branches.

bridge中,有几种方法可以移动分支.如果数量不多,您可以手动进行

In bridge, there are several ways to go about moving the branches. If there aren't many, you could do it manually

$ git checkout master
$ git checkout -b clean/master
$ git branch -D master

对于许多分支,您可以编写脚本(也许使用git for-each-ref开始).或者,您可能会以某种方式滥用filter-branch备份引用机制.

For lots of branches, you could script this (perhaps using git for-each-ref to kick things off). Or you could perhaps abuse the filter-branch backup ref mechanism in some way.

无论如何,一旦分支移动,就去干净的仓库并

Anyway, once the branches are moved, go to the clean repo and

$ git config remote.origin.fetch +refs/heads/clean/*:refs/remotes/origin/*

现在,与上一条命令不同,现在退后一步,当我在网桥存储库中为origin 的fc refspec给出了refspec时,我省略了通常在fetch refspecs上使用的前导+. ;这意味着,如果prod分支经历了历史记录重写,则提取将抱怨,并且您将知道有可能要解决的问题.以后再说吧.

Now taking a step back, unlike this last command, when I gave a fetch refspec for origin in the bridge repo, I omitted the leading + that is often used on fetch refspecs; that means that if a prod branch undergoes a history rewrite, the fetch will complain and you'll know you have a potential headache to resolve. More on that later.

因此,接下来,您可以在网桥仓库中运行

So next, in the bridge repo you can run

$ git fetch origin

它将重新加载prod/名称空间下的原始分支.

which will re-load the original branches under the prod/ namespace.

现在您既有原始分支(例如refs/heads/prod/master)又有干净的分支(例如refs/heads/clean/master).可以这样画

Now you have both the original branches (e.g. refs/heads/prod/master) and the clean branches (e.g. refs/heads/clean/master). It could be drawn like this

A' -- B' -- C' -- D' <--(clean/master)

A -- B -- C -- D <--(prod/master)

历史是无关的,您需要保持这种状态.但是您也想通过在prod/master上通过D提交来知道" clean/master分支是最新的",从而使合并将来的更改变得容易.一种方法是另外创建两个分支-我们将其称为bridge-prodbridge-clean.

The histories are unrelated, and you need to keep it that way. But you also want to "know" that the clean/master branch is "up-to-date" through the D commit on prod/master in a way that makes merging future changes easy. One way is to create two additional branches - let's call them bridge-prod and bridge-clean.

bridge-clean分支将始终指向我们从prod进行更改的最后一次提交.新的更改可能会在clean/分支本身中进行,但是bridge-clean会记住单独的prod清理版本会是什么样.

The bridge-clean branch will stay pointed at the last commit on which we brought changes in from prod. New changes may go on in the clean/ branches themselves, but bridge-clean will remember what a clean-up version of prod alone would look like.

$ git checkout clean/master
$ git branch bridge-clean

然后bridge-prod的工作应与bridge-clean具有相同的内容,直到它接收到来自prod/master的新更改-之后将被用作再次更新bridge-clean的参考.

Then bridge-prods job is to have the same content as bridge-clean, until it receives new changes from prod/master - after which it will be used as a reference for updating bridge-clean once again.

因此要初始化它,我们创建一个父级为DD'副本.

So to initialize that, we create a copy of D' whose parent is D.

git checkout prod/master
git checkout -b bridge-prod
git rm -r ':/'
git checkout bridge-clean -- ':/'
git commit

现在有

A' -- B' -- C' -- D' <--(bridge-clean)(clean/master)

                  D" <--(bridge-prod)
                /
A -- B -- C -- D <--(prod/master)

其中D'D"具有相同的内容(这是D的清除"版本).由于D"D作为其父级,因此您可以将将来的更改从prod/master合并到bridge-prod(D将作为合并基础).所以过了一段时间之后

where D' and D" have identical content (which is the "cleaned" version of D). Because D" has D as its parent, you can merge future changes from prod/master into bridge-prod (D will be the merge base). So after some time you have

                     ... x <--(clean/master)
                    /
A' -- B' -- C' -- D' <--(bridge-clean)

                  D" <--(bridge-prod)
                /
A -- B -- C -- D ... H <--(prod/master)

这两个...可以包括许多提交,分支,合并等.并没有太大的区别.重要的是bridge-prodbridge-clean仍代表存储库之间的最后一个集成.

The two ... could include many commits, branches, merges, whatever; it doesn't make a big difference. The important thing is that bridge-prod and bridge-clean still represent the last integration between the repos.

因此,接下来您想将prod/master合并到bridge-prod.

So next you want to merge prod/master to bridge-prod.

                     ... x <--(clean/master)
                    /
A' -- B' -- C' -- D' <--(bridge-clean)

                 D" -- H"<--(bridge-prod)
                /     /
A -- B -- C -- D ... H <--(prod/master)

您希望H"代表H的清理状态.为此,有两个条件需要担心:

You want H" to represent the cleaned-up state of H. For that, there are two conditions to worry about:

如果prod/master分支更新了清理删除的文件,则合并将发生冲突.幸运的是,这些删除是合并的我们"方面的唯一更改,我们知道我们希望将其保留在prod/master对这些文件可能执行的操作上.因此,当我们合并时,我们可以说

If the prod/master branch updates a file that was removed by the clean-up, then the merge will conflict. Luckily these removals are the only changes on "our" side of the merge, and we know we want to keep them over whatever prod/master might have done to those files. So when we merge we could say

git checkout bridge-prod
git merge -X ours prod/master

不应将-X ours选项与-s ours混淆.尽管-s ours将使用我们的合并策略",而完全忽略了prod/master的更改,但-X ours使用默认的合并策略和我们的策略选项"(感谢git,以as-mud命名)

The -X ours option should not be confused with -s ours. While -s ours would use the "ours merge strategy", ignoring the prod/master changes entirely, -X ours uses the default merge strategy with the "ours strategy option" (thanks, git, for the clear-as-mud naming).

这意味着,该命令将尝试正常合并,但是每次出现冲突时,该大块代码的bridge-prod版本将占上风.由于bridge-prod的唯一更改是删除了我们不需要的文件,所以很好.

What this means is, this command will try to merge as normal, but every time there's a conflict the bridge-prod version of that hunk of code will prevail. Since the only changes on bridge-prod are removal of files we don't want, this is good.

另一个问题是,如果prod/master可能已经添加了一个新文件,则应将其排除在清理范围之外.如果您知道那不可能发生,那就没问题了.如果可能发生,那么您需要检查一下.例如,在合并之前,您可以说

The other problem would be if prod/master might have added a new file that should be excluded from the clean-up. If you know that can't happen, no problem. If it could happen, then you need to check for it. For example before merging you could say

git diff prod/master prod/master^

,查看干净回购中是否有不需要的新文件.如果是这样,那么为您的合并做

and see if there are any new files that you wouldn't want in the clean repo. If so, then for your merge do

git checkout bridge-prod
git merge -X ours --no-commit prod/master
# remove the unwanted files
git add ':/'
git commit

现在,由于D"D'的内容相同,这意味着H"具有您在下一次bridge-clean提交中想要的TREE.

Now, because D" is the same content as D', that means that H" has the TREE you want in the next bridge-clean commit.

git checkout bridge-clean
git rm -r ':/'
git checkout bridge-prod -- ':/'
git commit

这给你

                     ... x <--(clean/master)
                    /
A' -- B' -- C' -- D' -- H' <--(bridge-clean)

                 D" -- H"<--(bridge-prod)
                /     /
A -- B -- C -- D ... H <--(prod/master)

H'具有与H"相同的内容-这是经过清理的内容,可通过H更新.另外,H'已清除历史记录(它的父级是D',我们一开始就对其进行了清理),因此可以安全地将其包含在干净的回购中.您可以将bridge-clean合并到master,更改传输已完成.

H' has the same content as H" - which is the sanitized content, updated through H. Also, H' has sanitized history (it's parent is D', which we cleaned up at the outset), so it can safely be included in the clean repo. You can merge bridge-clean to master and the change transfer is complete.

从概念上讲,这有点涉及,并且需要进行一些前期设置(并可能编写一些脚本以用于每次更改集成).但是,一旦完成所有设置,它便可以最大程度地减少人工摆弄,并让您充分利用git提供的合并机制.

This is conceptually a bit involved, and takes some up-front setup (and maybe writing a few scripts to use with each integration of changes). But once that's all set up, it minimizes the manual fiddling and lets you make the best applicable use of the merge machinery git provides.

但是,它是单向桥梁.如果要将bridge-prod合并回prod/master,几乎可以肯定会删除要保存在prod/master中的文件.

However, it's a one way bridge. If you were to merge bridge-prod back into prod/master, you would almost certainly delete files that you want kept in prod/master.

如果您必须从原始存储库中进行更改并将其应用于产品存储库,则可以在原始存储库上生成补丁.为了使干净的回购内容是产品回购内容的一个子集,应在应用补丁程序时避免过多的麻烦.下次您合并从生产到清洁的更改时,这可能会导致一些虚假的冲突.

If you do have to take changes from the clean repo and apply them to the prod repo, you could generate a patch on the clean repo. Tot he extent that the clean repo content is a subset of the prod repo content, the patch should apply without too much hassle. It might cause some spurious conflicts the next time you merge changes from prod down to clean.

最后一个要点(如上所述,但随后被遗忘)-所有这些都假定您不会(或至少不经常)在prod存储库中进行历史记录重写.如果您要进行这样的重写,则就像另一个用户的克隆无法干净地提取更改一样,该桥也无法正常工作以将更改集成到干净的仓库中.您必须根据具体情况制定一个程序.

One last additional point (mentioned above but then forgotten) - This all assumes that you won't be doing history rewrites in the prod repo going forward (or at least not often). If you were to do such a rewrite, then just as another user's clone couldn't cleanly pull changes, the bridge wouldn't work normally for integrating the change into the clean repo. You'd have to work out a procedure based on the specifics of the situation.

这篇关于将上游分支合并到具有重写历史记录的fork中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆