添加原始哈希以在git rebase上提交(具有新的根) [英] Add original hash to commit on git rebase (with new root)

查看:271
本文介绍了添加原始哈希以在git rebase上提交(具有新的根)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以前使用SVN进行管理的代码库,但是现在使用git进行管理.当代码迁移到git时,历史记录丢失了.

I have a codebase that used to be managed with SVN, but is now managed with git. When the code was migrated to git, the history was lost.

我设法恢复SVN历史记录,现在正尝试git-rebase顶部.

I have managed to recover the SVN-history, and am now trying to git-rebase the more recent commits over the top.

我有两个分支,git-commits,其中包含自迁移到git以来的提交; svn-commits,其中包含较旧的历史记录.每个分支包含3000多个提交.

I have two branches, git-commits, which contains the commits since the migration to git, and svn-commits which contains the older history. Each branch contains over 3000 commits.

我发现以下命令可以在旧命令的基础上建立新的历史记录(尽管有些手动合并冲突处理):

I have found that the following command builds the new history on top of the old (albeit with some manual merge conflict handling):

git rebase git-commits --root --onto svn-commits --preserve-merges

几个提交参考提交哈希,而且我知道在完成基础修改后,这些哈希会改变.为了使这些信息不会永远丢失,我想将每个提交的原始提交哈希添加到新基于重新提交的消息中.

Several of the commits reference commit hashes, and I am aware that these would change when the rebase is done. So that this information is not lost forever, I would like to add the original commit hash of each commit to the newly-rebased commit's message.

这意味着原始提交是这样的:

This would mean that an original commit like this:

commit aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Author: Boaty McBoatface <boaty@example.com>
AuthorDate: Wed Jul 27 00:00:00 1938 +0000
Commit: Boaty McBoatface <boaty@example.com>
CommitDate: Wed Jul 27 00:00:00 1938 +0000

Reticulate splines

The splines had been derezzed, and needed to be reticulated.

会变成类似

commit bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Author: Boaty McBoatface <boaty@example.com>
AuthorDate: Wed Jul 27 00:00:00 1938 +0000
Commit:     Meshy <meshy@example.com>
CommitDate: Wed Nov 16 10:23:31 2016 +0000

Reticulate splines

The splines had been derezzed, and needed to be reticulated.

Original hash: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

这可能吗?也许使用git-filter-branch?

推荐答案

首先,请注意:请确保您确实要执行此操作,因为git replace(下面将简要介绍)可用于将历史记录合并到一起.保留ID的方式.当然,它也有自己的缺点.从使用过的人那里搜索报告.

First, a note: be sure you really want to do this, since git replace (mentioned briefly below) can be used to stitch together the histories in a way that preserves the IDs. It has its own drawbacks too, of course; search for reports from people who have used it.

是的,您可以使用git filter-branch完成此操作.

Yes, you can do this with git filter-branch.

尽管如此,您可能想结合步骤,将在新转换基础上重新构建新提交"步骤与"...,然后编辑所有新提交以也包含其旧ID"步骤,因为rebase通过复制提交起作用,而filter-branch通过... copying 提交起作用. :-)

You might, though, want to combine the "rebase new commits atop new conversion" step with the "... and then edit all the new commits to also contain their old IDs" step, because rebase works by copying commits, and filter-branch works by ... copying commits. :-)

所有必须执行此类操作的Git命令必须复制,因为每个提交的哈希ID是提交内容的函数.如果新提交与原始提交有任何不同,它将获得一个新的,不同的ID.

All Git commands that do this kind of thing must copy, since the hash ID of each commit is a function of the commit's contents. If the new commit is different from the original commit in any way, it gets a new, different ID.

git rebasegit filter-branch之间的区别在于复制提交的方式以及执行复制的方式.

The differences between git rebase and git filter-branch lie in which commits are copied and how the copying is performed.

在没有--preserve-merges的情况下完成rebase时,可以通过选择一个非合并提交的列表,然后将每个这样的提交变成一个变更集(通过减法,或多或少:child减去parent =从父代到子代的增量)来工作,然后将此增量添加到--onto点或到目前为止的commits-added-

Rebase, when done without --preserve-merges, works by selecting a list of non-merge commits, turning each such commit into a changeset (via subtraction, more or less: child minus parent = delta from parent to child), then adding this delta to the --onto point or to the commits-added-so-far.

使用--preserve-merges时,重新设置 still 的基准会选择一个非合并提交的列表.然后,在发生合并提交的地方,对合并执行 re-performs 基准操作(这就是为什么您必须再次解决合并冲突的原因).它必须重新合并,因为新的基础可能会导致不同的合并,并且由于合并不能变成单个变更集(子代-父代"为您提供了一个增量,但至少有两个父代,因此至少有两个增量,在一般情况下,我们不能同时保留这两个).

When you use --preserve-merges, rebase still selects a list of non-merge commits. Then, where there was a merge commit, rebase re-performs the merge (which is why you must resolve merge conflicts all over again). It must re-merge, because the new base may result in a different merge, and because merges cannot be turned into a single changeset ("child - parent" gives you one delta, but there are at least two parents, hence at least two deltas, and in the general case we cannot preserve both).

Filter-branch使用完全不同的方法.无论要合并的与否,都会选择要过滤的提交. (实际选择是通过运行git rev-list(相当于git log的管道")来完成的.)将提交ID的完整列表放入一堆中:存储在普通文件中的,按拓扑顺序排序的堆,这样,父提交总是在子提交之前进行处理.

Filter-branch uses an entirely different approach. The commits to be filtered are selected regardless of whether they are merges or not. (The actual selection is done by running git rev-list, which is the "plumbing" equivalent of git log.) This complete list of commit IDs is placed into a pile: a sorted, topological-order pile stored in an ordinary file, so that parent commits are always processed before their children.

然后,对于列表中的每个ID:

Then, for each ID in the list:

  • 将原始提交la git checkout提取到没有底层Git存储库的临时树中.

  • Extract the original commit a la git checkout, into a temporary tree that has no underlying Git repo.

应用树过滤器来修改树. (此修改在保存临时树的临时目录中运行.当他们尝试访问诸如../../fixed-version之类的文件时,该部分使很多人在执行第一个树过滤器时跳闸.相对路径失败,因为临时树是根本不在存储库中.)

Apply the tree filter to modify the tree. (This modification runs in the temporary directory that holds the temporary tree. That part trips up a lot of people doing their first tree-filter, when they try to access a file like ../../fixed-version. The relative path fails because the temporary tree is not in the repository at all.)

重建代表新树的一组新的Git树和blob对象,即新的提交快照.

Reconstruct a new set of Git tree-and-blob-objects representing the new tree, i.e., the new commit snapshot.

将提交消息过滤器应用于消息.

Apply the commit message filter to the message.

将提交环境过滤器应用于其余的提交元数据(作者和提交者的东西).

Apply the commit environment filter to the remaining commit metadata (author and committer stuff).

使用新消息和新树进行新提交.或者,如果您提供了提交过滤器,则使用它进行或不进行提交;并且您还可以在此时使用父级过滤器修改新提交的父级.

Make a new commit using the new message and new tree. Or, if you supply a commit filter, use it to make-or-don't-make the commit; and you can also modify the new commit's parent(s) at this point, using the parent filter.

最后,记录一个配对:旧提交< oldhash>变成了新提交< newhash>". (如果您使用提交过滤器跳过提交,则旧哈希将映射到其相应的新祖先,即您没有跳过的父代.)此配对为 地图 .

Last, record a pairing: "old commit <oldhash> became new commit <newhash>." (If you skip a commit using a commit filter, the old hash maps instead to its corresponding new ancestor, i.e., the parent that you didn't skip.) This pairing is a map.

由于提取+树过滤器+重建部分,此过程非常缓慢.因此,如果您使用树过滤器,则git filter-branch会跳过这一部分:无论如何,这只会使原始树恢复原状.为了让您无论如何都能修改新提交的内容,filter-branch还可以指定一个 index过滤器(无论如何,commit始终可以从索引开始工作,因此extract + modify + rebuild只会更新索引;如果我们可以就地更新,这要快得多).但是-这是关键点-出于您的目的,您无需对每棵树都做任何事情.您只需要修改亲本!,这样您就可以保留原始合并及其源树,而无需重新合并.

This process is extremely slow due to the extract + tree-filter + rebuild part. Therefore, if you don't use a tree filter, git filter-branch skips this part: it's just going to get the original tree back anyway. To let you modify the new commit's contents anyway, filter-branch also lets you specify an index filter (commits always work from the index anyway, so the extract+modify+rebuild just updates the index; if we can update in place, that's much faster). But—here's the key point—for your purposes you don't need to do anything at all to each tree. All you want is to modify the parentage! This will let you preserve your original merges and their source trees, with no re-merging.

请注意,--commit-filter描述是关于 map 便捷功能(shell函数)的.这个地图"功能使用了我上面提到的地图.默认设置是自动映射到新复制的提交的新父项.

Note that the --commit-filter description talks about the map convenience function (shell function). This "map" function uses the map I mentioned above. The default is to automatically map to the new parent of the new copied commit.

最后,在复制所有提交后,并且,如果提供了--tag-name-filter,还复制了带注释的标签并映射了副本(因此,如果您有带注释的标签,则 do 希望--tag-name-filter cat此处)— filter-branch命令重写了一些引用,即分支和标记名.仍将指向原始提交(和带注释的标记对象)的原始引用被转储到refs/original/命名空间中. (除非使用--force,否则该过程开始时必须为空.)重写的引用指向新副本.重写使用相同的映射技术,因此,如果跳过提交,则名称现在指向保留的祖先提交.

Finally, after copying all the commits—and, if you provide a --tag-name-filter, also copying annotated tags and mapping the copies (so if you do have annotated tags, you do want a --tag-name-filter cat here)—the filter-branch command rewrites some references, i.e., branch and tag names. The original references, which will still point to the original commits (and annotated tag objects), are dumped into the refs/original/ name-space. (This must be empty at the start of the process unless you use --force.) The rewritten references point into the new copies. The rewrite uses the same mapping technique, so that if there are skipped commits, the names now point to the retained ancestor commits.

("Some"引用?等等,哪个引用?答案在文档中,但这有点神秘:它谈论的是正引用.传递给git rev-list,以便您可以过滤特定范围的提交,例如branch~30..branchbranch ^otherbranch.正"引用是主动选择提交的引用,而负"引用是限制提交的引用提交,因此对于branch ^otherbranch,我们有一个正引用branch,有一个负引用(非其他分支).因此,这仅重写了refs/heads/branch而不是refs/heads/otherbranch.)

("Some" references? Wait, which references? The answer is in the documentation, but it's a bit mysterious: it talks about positive references. The arguments get passed to git rev-list so that you can filter a specific range of commits, e.g., branch~30..branch or branch ^otherbranch. The "positive" references are the ones that actively select commits, while the "negative" references are the ones that limit commits, so for branch ^otherbranch we have one positive reference, branch, and one negative, the not-otherbranch part. So this rewrites only refs/heads/branch and not refs/heads/otherbranch.)

解释上述所有问题的原因是,指出使用git filter-branch时的移植过程非常简单,然后说明如何访问地图.

The reason to explain all of the above is to point out how simple the transplant process is, when using git filter-branch, and then to show how to access the map.

首先,我们只需明确地替换一个单一的父ID.具体来说,我们希望git-commits中的 root commit 的父级成为svn-commits的现有提示提交:

First, we only need to explicitly replace one single parent ID. Specifically, we want the parent of the root commit in git-commits to become the existing tip commit of svn-commits:

$ git rev-parse svn-commits
9999999999999...

(这是所需的新父级),并且:

(that's the desired new parent), and:

$ git rev-list --max-parents=0 git-commits
11111111111111...

(这是根本的承诺,运气好的话,只有一个,否则,现在呢?).

(that's the root commit—with any luck there is only one, otherwise, now what?).

因此,我们需要一个 parent过滤器,其内容为:如果这是提交1111111 ...然后回显9999999 ...,否则只需将参数回显".默认的父参数在stdin上,作为一系列-p <id>,其ID已经映射.当然,现有的根目录没有 no 父项,因此stdin将不包含我们要在此处更改的一次提交的内容.因此:

So, we would want a parent filter that says: "if this is commit 1111111... then echo 9999999..., else just echo the arguments back". The default parent arguments are on stdin, as a series of -p <id>s, with the IDs already mapped. Of course, an existing root has no parents, so stdin will have no contents for the one commit we want to change here. Hence:

--parent-filter 'if [ $GIT_COMMIT = 11111... ]; then
  echo -p 999999...; else cat; fi'

filter-branch的这一部分将完成我们的重新父级化.请注意,与git rebase不同,所有树仅保留完整无缺.我们从来没有在这里将快照转换为增量,而只是将其保持原样. 这意味着无需重新解决合并冲突.

This part of the filter-branch will accomplish our re-parenting. Note that unlike git rebase, all the trees are simply retained intact. We never convert a snapshot to a delta here, we just take it as-is. This means there is no need to re-resolve merge conflicts.

(注意:您实际上可以在此处使用名称svn-commits代替硬编码99999....您也可以使用名称代替硬编码11111...,但我们不这样做一个名字,而且每次查询名字都会给过滤增加一点点延迟,对于svn-commits的重新父项来说,这只是一个很小的延迟;用于测试是否是旧根,那将是一小段延迟乘以3000次提交.)

(Side note: you can actually use the name svn-commits in place of the hard-coded 99999... here. You could use a name in place of the hard-coded 11111... as well but we don't have a name. Also, looking up the name each time will add a tiny bit of delay to the filtering. For the one re-parenting to svn-commits, that's one tiny delay; for testing whether this is the old root, though, that would be one tiny delay times 3000 commits.)

(第二边注:您也可以通过嫁接"或其更现代的版本git replace来重做父级.如果在运行filter-branch时强制执行嫁接或替换,则嫁接或替换将变为永久,因为Git只是按照指示复制提交,并且替换后还会附上指示.)

(Second side note: you can also do this reparenting via "grafts" or its more modern version, git replace. If a graft or replacement is in force when you run filter-branch, that graft or replacement becomes permanent, since Git simply copies the commits as instructed, with the instructions also following the replacement.)

这仍然留下了过滤提交 messages 来添加以下内容的问题:

That still leaves the problem of filtering the commit messages, to add:

Original hash: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

如上所示,原始哈希在$GIT_COMMIT中,因此我们需要的是:

As shown above, the original hash is in $GIT_COMMIT, so all we need is this:

--msg-filter 'cat; echo; echo "Original hash: $GIT_COMMIT"'

如果想花哨的话,我们甚至可以使用该 map 便捷功能:

If we wanted to be fancy, we could even use that map convenience function:

--msg-filter 'cat; echo; echo "new commit $(map $GIT_COMMIT) \
filtered to reparent original commit $GIT_COMMIT"'

或类似的愚蠢的东西,但是没有充分的理由去打扰... 除非您想真正地 看一下是否可以检测到旧的哈希ID在提交消息中,并将它们重写到位.我不确定这是否是个好主意,也不会尝试为此提供一些shell脚本,但是请注意,所有这些过滤器的 1 都被评估"为贝壳碎片.您可以从这些评估的片段中 调用其他shell脚本,只需记住所有过滤操作都在一个临时目录中进行.

or something silly like that, but there's no good reason to bother ... unless you want to get really fancy, and see if you can detect old hash IDs in the commit message and rewrite them in place. I'm not sure if this is even a good idea, and won't attempt to provide a bit of shell script for it, but note that all1 of these filters are "eval"-ed as shell fragments. You can invoke other shell scripts from these eval-ed fragments, just remember that all the filtering is going on in a temporary directory.

在参考git-commits上运行过滤.过滤完成后,refs/heads/git-commits将指向最后复制的提交,而refs/original/refs/heads/git-commits将指向原始链(在上面的示例中植根于11111...的链).

Run the filtering on the reference git-commits. Once the filtering is done, refs/heads/git-commits will point to the last copied commit, and refs/original/refs/heads/git-commits will point to the original chain (the one rooted at 11111... in the above examples).

1 好吧,几乎全部.正如文档所述,出于技术原因,除了提交过滤器这一明显例外".

1Well, almost all. As the documentation says, "with the notable exception of the commit filter, for technical reasons".

我们需要两个过滤器,--parent-filter(或有效的移植物或替换物)和--msg-filter.父过滤器说将移植副本的根替换为我们要移植到的位置的尖端",这将完成我们的变基,而无需更改快照.消息过滤器显示此新提交替换了我们在过滤时从变量$GIT_COMMIT扩展了其ID的提交".

We need two filters, --parent-filter (or a graft or replacement in force), and --msg-filter. The parent filter says "replace the root of the transplanted copy with the tip of the place we're transplanting onto", and this accomplishes our rebase-without-changing-snapshots. The message filter says "this new commit replaces the commit whose ID we expanded at filtering-time from the variable $GIT_COMMIT".

这篇关于添加原始哈希以在git rebase上提交(具有新的根)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆