我如何在移动的文件中合并Git中的更改? [英] How do I merge changes in Git in files that I moved?

查看:204
本文介绍了我如何在移动的文件中合并Git中的更改?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搬了一些目录。

当我合并时,有很多冲突的文件,因为其他开发者已经提交了更改。 egit合并工具和 git mergetool 都表示该文件已被本地或远程删除。见图片。



如何合并这些更改?

解决方案

文件历史记录和重命名检测



您绝对不用担心关于Git中的保存历史。 Git根本没有 file 历史记录,它只有 commit 历史记录。也就是说,每个提交指向(包含它的父项的哈希ID),或者对于合并,都是它的父项,并且这个历史记录:commit E 之前是commit D ,而commit D 是前面是commit C ,依此类推。只要你有提交,你就有历史记录。



也就是说,Git 可以尝试 合成一个特定文件的历史记录,使用 git log --follow 。您指定了一个开始提交和一个路径名,Git检查commit-by-commit,以确定在将当前提交的父项与当前提交进行比较时是否重命名了该文件。这使用Git的重命名检测来标识commit L (左)中的文件 a / b.txt 为相同的文件作为文件 c / d.txt 在提交 R 中(右)。

重命名检测有很多繁琐的旋钮,但在基本级别上,它基本上是这样的:




  • Git查看所有文件名在提交 L 中。

  • Git查看提交中的所有文件名 R
  • 如果有一个文件名从 L 消失并出现在 R 中,如 a / b.txt 为并且 c / d.txt 是全新的,为什么这是检测到的重命名的候选
  • 现在有了候选项(未配对的 L 文件和未配对的 R 文件),Git会比较这些未配对文件的内容 li>


未配对的文件进入配对队列(一个用于 L ,一个用于 R ),Git散列所有th的内容e文件。它已经有了内部的Git散列,因此它首先直接比较所有这些。如果文件完全不变,它在 L R 中具有相同的Git哈希ID(但名称不同),并且可以立即配对从配对队列中删除。



现在,精确匹配被取消了,Git尝试了很长时间的缓慢记录。它需要一个不成对的 L 文件,并为每个 R 文件计算一个相似性索引。如果某个 R 文件与之非常相似或者几个文件相同,则它将采用最相似的 R 文件并将其与 L 文件配对。如果没有文件足够相似,则 L 文件保持未配对状态(已从队列中取出)并被视为从 L 删除。最终在未配对的 L 队列中没有文件,并且任何文件保留在未配对的 R 队列中,这些文件将被添加(在 R )。同时,所有配对文件已被重新命名。

这意味着什么:比较( git diff )将 L 提交到 R ,如果两个文件足够相似,则它们会重新配对。默认相似性指数为50%,所以文件需要是50%的匹配(无论这意味着什么 - 相似性指数计算有点不透明),但精确匹配更容易和更快注意, git log --follow 可以重新命名检测(只有一个目标

em> R
文件,因为我们正在通过日志工作 backwards ,将父提交与仅在子文件中知道其名称的一个文件进行比较)。自从Git版本2.9开始,现在已经自动打开了重命名检测功能,它们都是 git diff 和 git log -p 在旧版本中,您必须使用 -M 选项来设置相似性阈值,或者将 diff.renames 配置为 true ,得到 git diff git log -p 做重命名检测。



配对队列也有最大长度。这已经翻了两番,一次在Git 1.5.6和一次在Git 1.7.5。您可以自己控制它:它可以配置为 diff.renameLimit merge.renameLimit 。当前的限制是400和1000.(如果你将它们设置为零,Git使用它自己的内部最大值,这可以消除大量的CPU时间 - 这就是为什么这两个限制首先存在的原因。如果你设置 diff.renameLimit 但不是 merge.renameLimit git merge 使用您的差异)



这导致适用于 git log --follow 的经验法则:如果可能,当您打算重命名某个文件或一组文件时,请自行提交重命名步骤,而不更改任何文件内容。如果可能,请将重命名文件的数量保持得相当小:等于或低于400 , 例如。您可以通过多个步骤提交更多重命名,每次400个。但请记住,你正在交易 git log --follow 的能力和速度,以便毫无意义地提交你的历史记录:如果你需要重命名50000个文件,也许你应该做到这一点。



但这会如何影响合并?那么, git merge ,例如 git log --follow ,总是会启用重命名检测。但是哪个提交是 L ,哪个提交或提交是?



合并和重命名检测



无论何时运行:

  git merge< commit-specifier> 

Git必须在您当前的(HEAD)commit之间找到 merge base 和指定的其他提交。 (通常这只是 git merge< branchname> 。通过将分支名称解析为提交来选择其他分支的 tip 通过Git中分支名称的定义,这是该分支的提示提交,所以这个正常工作,但你可以通过哈希ID来指定任何提交,例如)让我们调用这个合并基本提交 B (用于基础)。我们已经知道我们自己的提交是 HEAD ,尽管有些东西叫做本地。让我们调用另一个commit O (对于其他),尽管有些东西称之为远程(这很愚蠢:Git中没有东西是远程的!)。

然后Git确实有两个 git diff s。比较 B 和HEAD,所以对于这个特定的差异, L B ,而 R 是HEAD。根据我们在上面看到的规则,Git会检测或未能检测到,重命名。然后Git执行另一个 git diff ,它将 B O 进行比较。

如果某个文件在 B -vs-HEAD中重命名,像往常一样,Git会比较它的内容。如果某个文件在 B -vs- O 中重命名,Git会像往常一样对其内容进行比较。如果在HEAD和 O 中将一个 B 文件 F 重命名为两个不同的名称,则Git会声明一个重命名/在该文件上重命名冲突,并在工作树中保留两个名称以供清理。如果它在中只有一个 diff,那么它在HEAD或者 O 中仍然被称为 F - 然后Git将文件存储在工作树中来自任何一方的新名称都将其重命名。在任何情况下,Git都会像往常一样尝试将两组更改(来自 B -vs-HEAD和 B -vs- O ) 。 1



当然,对于Git来检测重命名,文件的内容必须足够类似, 一如既往。这对于Java文件(有时也是Python)尤其有问题,其中文件名嵌入到import语句中。如果一个模块主要由import语句组成,并且只有几行代码,重命名引发的更改将压倒剩余的文件内容,并且这些文件甚至不会达到50%的匹配。



有一个解决方案,虽然它有点难看。就像 git log --follow 的经验法则一样,我们可以只提交更改,然后提交更改内容的修复所有进口作为单独的提交。然后,当我们进行合并时,我们可以做两次甚至是三次合并:

  git checkout ...#我们打算合并到的任何分支
git merge< hash> #与大重命名之前的所有内容合并

由于没有重命名文件, ,或像往常一样糟糕。以图表形式显示结果。请注意,我们提供给 git merge 命令的散列是在<$ c $之前提交的散列 A

  ...-- *  -  o-c> R  -...- o  -  M<  - 主线
\ /
o - o --...- A - R - ... - o< - 开发,重命名为R

然后:

  git merge< R的散列值> 

由于每个文件的内容完全相同,其他 R 提交 - 合并基础是提交 A - 这里的效果仅仅是提取所有重命名。我们保留来自HEAD commit < M 的文件内容,但是从 R 中保存文件内容。此合并应自动成功:

  ...-- *  -  o --...-- o  -  M --N < - 主线
\ / /
o - o --...- A - R --...-- o< - 开发,带有重命名在R

现在我们可以 git merge develop 继续合并开发分支。



在很多情况下,我们不需要合并 M ,但是如果我们需要为所有重命名合并 N ,那么执行它可能不是一个坏主意。原因是提交 R 不起作用:它的导入名称不正确。提交 R 必须在等分时跳过。这意味着合并 N 同样是不起作用的,并且在平分过程中必须跳过。由于 M 实际上可以正常工作,所以 M 可能是好的。



请注意,如果您执行这些操作之一,则会扭曲/扭曲您的源代码,以取悦您的版本控制系统。这不是一个好的情况。它可能比你的其他选择不那么糟糕,但不要告诉自己它很好




1 我仍然需要了解在发生重命名/重命名冲突时文件的两个副本会发生什么情况。既然Git在工作树中都留下了名字,那么这两个名称是否包含相同的合并内容,如果需要的话还会加上任何冲突标记?也就是说,如果文件被命名为 base.txt ,现在命名为 head.txt other.txt ,执行 head.txt other.txt的工作树版本总是匹配?


I moved some directories.

When I merge, there are many conflicting files, since other developers have committed their changes. Both egit Merge Tool and git mergetool say that the file was deleted locally or remotely. See image.

How do I merge these changes?

解决方案

File history and rename detection

You never really need to worry about "preserving history" in Git. Git does not have file history at all, it has only commit history. That is, each commit "points to" (contains the hash ID of) its parent—or, for a merge, both its parents—and this is the history: commit E is preceded by commit D, while commit D is preceded by commit C, and so on. As long as you have the commits, you have the history.

That said, Git can try to synthesize the history of one specific file, using git log --follow. You specify a starting commit and a path name, and Git checks, commit-by-commit, to see if the file was renamed when comparing the current commit's parent to the current commit. This uses Git's rename detection to identify that file a/b.txt in commit L (left) is "the same file" as file c/d.txt in commit R (right).

Rename detection has a lot of fiddly knobs, but at the base level, it's basically this:

  • Git looks at all the file names in commit L.
  • Git looks at all the file names in commit R.
  • If there's a file name that vanishes from L and appears in R, such as a/b.txt is gone and c/d.txt is all-new, why, that's a candidate for a detected rename.
  • Now that there are candidates (unpaired L files and unpaired R files), Git compares the contents of these unpaired files.

Unpaired files go into a pairing queue (one for L, one for R), and Git hashes the contents of all the files. It already has the internal Git hash so it compares all those directly, first. If a file is completely unchanged, it has the same Git hash ID (but different names) in L and R, and can be immediately paired-up and removed from the pairing queues.

Now that exact-matches are taken out, Git tries the long slow slog. It takes one unpaired L file, and computes a "similarity index" for every R file. If some R file is sufficiently similar—or several are—it takes the "most similar" R file and pairs it with the L file. If no file is sufficiently similar, the L file remains unpaired (is taken out of the queue) and is considered "deleted from L". Eventually there are no files in the unpaired L queue, and whatever files remain in the unpaired R queue, those files are "added" (new in R). Meanwhile, all paired-up files have been renamed.

What this means is: When comparing (git diff) commit L to R, if two files are sufficiently similar, they get paired up as a rename. The default similarity index is 50%, so the files need to be a 50% match (whatever that means—the similarity index computation is somewhat opaque), but an exact match is much easier and faster for Git.

Note that git log --follow enables rename detection (on just one target R file, as we're working backwards through the log, comparing the parent commit to just the one file whose name we know in the child). Since Git version 2.9, both git diff and git log -p now have rename detection turned on automatically. In older versions, you had to use the -M option to set the similarity threshold, or configure diff.renames to true, to get git diff and git log -p to do rename detection.

There is also a maximum length for the pairing queues. This has been doubled twice, once in Git 1.5.6 and once in Git 1.7.5. You can control it yourself: it is configurable as diff.renameLimit and merge.renameLimit. The current limits are 400 and 1000. (If you set these to zero, Git uses its own internal maximum, which can chew up enormous amounts of CPU time—that's why these two limits exist in the first place. If you set diff.renameLimit but not merge.renameLimit, git merge uses your diff setting.)

This leads to a rule of thumb that applies to git log --follow: If possible, when you intend to rename some file or set of files, commit the rename step by itself, without changing any of the file contents. If possible, keep the number of renamed files fairly small: at or below 400, for instance. You can commit more renames in multiple steps, 400 at a time. But remember that you're trading off git log --follow ability and speed against cluttering up your history with pointless commits: if you need to rename 50000 files, maybe you should just do it.

But how does this affect merging? Well, git merge, like git log --follow, does always turn on rename detection. But which commit is L and which commit or commits are R?

Merging and rename detection

Whenever you run:

git merge <commit-specifier>

Git has to find the merge base between your current (HEAD) commit and the specified other commit. (Usually this is just git merge <branchname>. That selects the tip commit of that other branch by resolving the branch name to the commit to which it points. By the definition of "branch name" in Git, that's the tip commit of that branch, so that this "just works". But you can specify any commit by hash ID, for instance.) Let's call this merge base commit B (for base). We already know that our own commit is HEAD, though some things call this "local". Let's call the other commit O (for other), though some things call this "remote" (which is silly: nothing in Git is remote!).

Git then does, in effect, two git diffs. One compares B vs HEAD, so for this particular diff, L is B and R is HEAD. Git will detect, or fail to detect, renames according to the rules we saw above. Then Git does the other git diff, which compares B to O. Git will detect or fail to detect renames according to the same rules yet again.

If some file is renamed in B-vs-HEAD, Git diffs its contents as usual. If some file is renamed in B-vs-O, Git diffs its contents as usual. If a single B file F is renamed to two different names in HEAD and O, Git declares a rename/rename conflict on that file, and leaves both names in the work-tree for you to clean up. If it's renamed in only one diff—it's still called F in either HEAD or O—then Git stores the file in the work-tree using the new name from whichever side renamed it. In any case, Git tries to combine the two sets of changes (from B-vs-HEAD and B-vs-O) as usual.1

Of course, for Git to detect the rename, the contents of the file must be sufficiently similar, as always. This is particularly problematic for Java files (and sometimes Python as well), where the file names become embedded in import statements. If a module consists mostly of import statements, with just a few lines of code of their own, the rename-induced changes will overwhelm the remaining file contents, and the files will not be even a 50% match.

There is a solution, though it is a bit ugly. As with the rule of thumb for git log --follow, we can commit just the renames first, and then commit the content-changing "fix all the imports" as a separate commit. Then, when we go to merge, we can do two or even three merges:

git checkout ...  # whatever branch we plan to merge into
git merge <hash>  # merge with everything just before the Great Renaming

Since no files are renamed, this merge will go as well, or as poorly, as usual. Here's the result, in graph form. Note that the hash we supplied to the git merge command was the hash of commit A, just before R that does all the renames:

...--*--o--...--o--M    <-- mainline
      \           /
       o--o--...-A--R--...--o   <-- develop, with renames at R

Then:

git merge <hash of R>

Since every file's content is completely identical, name-wise, across the other R commit—the merge base is commit A—the effect here is merely to pick up all the renames. We keep the file contents from HEAD commit M, but the names from R. This merge should succeed automatically:

...--*--o--...--o--M--N    <-- mainline
      \           /  /
       o--o--...-A--R--...--o   <-- develop, with renames at R

and now we can git merge develop to proceed to merge the development branch.

In many cases, we won't need to make merge M, but it might not be a bad idea to do it anyway if we need to make merge N just for all the renames. The reason is that commit R is not functional: it has the wrong names for imports. Commit R must be skipped during bisection. This means that merge N is similarly non-functional and must be skipped during bisection. It might be good to have M present, since M could actually work.

Note that if you do any of this, you are distorting / contorting your source code just to please your version control system. This is not a good situation. It may be less bad than your other alternatives, but don't tell yourself it's good.


1I still need to see what happens to the two copies of the file when there is a rename/rename conflict. Since Git leaves both names in the work-tree, do both names contain the same merged contents, plus any conflict markers if needed? That is, if the file was named base.txt and is now named head.txt and other.txt, do the work-tree versions of head.txt and other.txt always match?

这篇关于我如何在移动的文件中合并Git中的更改?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆