需要在两个Git comits之间获取所有文件差异(添加,修改,重命名) [英] Need to get all file differences (added, modified, renamed) between two Git comits

查看:194
本文介绍了需要在两个Git comits之间获取所有文件差异(添加,修改,重命名)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图导出两个提交之间差异的所有文件,这些不同之处在于:


  • 新文件(新增)

  • 修改后的文件

  • 重新命名的文件

  • 如有可能,



检测重命名可能很困难,因为我将在 Windows 7 环境中进行导出,因此 somefile.php SomeFile.php 相同;但是我会将它们上传到一个 * nix 环境,该环境确实将这些文件视为不同,因此需要在可能的情况下识别并导出它们。



我使用下面的命令:

  git diff-tree -r --no-commit-id  - -name-only --diff-filter = ACMRT $ head_commit_id $ older_commit_id | xargs tar -cf project.tar -T  -  

但是我注意到它没有导出新的/添加的文件也没有导出重命名的文件;然后我发现 git diff-tree 默认不会重命名检测,所以从我看到的我需要添加 - find-renames 作为命令?

解决方案

代码向导的答案,您可以使用用户友好(或瓷器)命令 git diff 而不是 git diff-tree ,这是Git调用的一个 plumbing 命令,用于脚本。但是,您应该知道这意味着什么。



由于瓷器命令是为人类设计的,因此他们试图用人类可读的方式呈现事物。这意味着他们服从任何设置,特别是人为他/她自己,在各种配置文件中。这包括 diff.renames diff.renameLimit 配置。他们也可能会修改他们的输出以使眼球更容易处理,但计算机程序更难处理。最糟糕的是,他们可能会将其输出从一个Git版本更改为另一个Git版本,如果人们似乎更喜欢某种默认值。

因为脚本是而不是意味着上面提到的,它们以可预测的方式运行,输出不会改变,也不依赖于配置项。这样,无论您要求什么,您都会得到:您将以可靠的形式获得可靠的输出,因此,如果您编写自己可靠的代码,那么今天就不会有效;它将在未来继续工作,对于所有可能的情况。 1



最后,这意味着如果你使用 git diff-tree 并设置正确的标志,您将获得更可靠的输出。如果您使用 git diff ,您的重命名检测取决于:用户配置



正如您发现的那样,重命名检测的输出是两个路径名,这不是您可以管理的路径一个存档。一般来说,Archivers在文件删除方面存在问题(这可能是档案备份 / 快照之间的一个经典区别;注意,这两者都与版本控制有关)。

如果你的目标是所有文件的联合 - 即如果diff说一个名为 A 的文件被添加,一个名为 D 的文件被删除,文件 R 是通过重命名旧名称 O (也可能是修改它):创建字母后面的Git的相似索引数字 R ),那么您希望收集文件 A ,忽略文件 D ,然后收集文件 R ,同时忽略文件 O - 好,那么,你想要什么是不是首先检测重命名!如果你不检测重命名 - 哪个 git diff-tree 不是默认的 - 这个相同的差异将会显示为:add file A ,删除文件 D ,删除文件 O ,并添加文件 - [R 。因此,带有 diff-filter git diff-tree 包含 AM 并排除 D 足够。不太清楚如何处理 T ,这是用于类型更改的:从普通文件到符号链接,或者从文件到子存储库提交散列(Git为一个子模块调用 gitlink 条目)。

同样,你不要想要启用复制检测: C 状态(如 R )显示相似索引和的路径名称。如果您将其禁用,您只需将新路径名作为 A dded文件。



即使您所有这一切,你仍然陷入了一个陷阱。假设提交散列 C1 有一个名为问题的文件,并且(大概是稍后)提交散列 C2 名为问题/ A 问题/ B 的文件。这意味着在这两点之间原始文件问题删除了,因为大多数系统(包括Git本身)都禁止同时存在一个文件命名为问题和一个名为问题目录。考虑到每个tar-archive本身并不是完整的快照,您可以省略 C1 C2 之间未修改的文件 - 提取这些快照必须是加法的:提取较早的快照,然后提取较早快照上的较晚快照。这个过程在文件问题妨碍创建目录问题 >。显然,你可以检查这些问题并删除有问题的文件(你现在可以看到为什么我将文件命名为问题 :-)),但更普遍的是,因为你不是首先存储删除指令,但在将来您使用这些存档重建快照的情况下,您不会知道某些文件根本不属于该快照。



(这个 问题的经典解决方案是在update-archives前加上一些清单或指令,如果您决定使用这种解决方案,那么取决于在清单或指令中需要的细节类型,您可能需要首先检测精确重命名和/或精确副本。)






1 显然,新增功能可能会给每个人带来问题,不仅仅是脚本,但是Git人员会努力不要为依赖管道命令的脚本创建不必要的问题。例如,考虑推动Git使用SHA-256的一些风格代替或除了SHA-1之外的新推动力。由于SHA-1产生160位散列,并且SHA-256产生256位散列,所以这些散列必须分别表示为40位和64位十六进制数字。 Linus建议在默认情况下将256位散列缩写为40个字符,以帮助现有的假设<40个字符的脚本,但我预见到一些问题......: - )


I'm trying to export all files with differences between two commits, those differences being:

  • New files (Added)
  • Modified files
  • Renamed files
  • If possible, information on any deleted files

Detecting renames may be a tough one as I will be doing the exporting on a Windows 7 environment and hence somefile.php is the same as SomeFile.php; but I will be uploading them to a *nix environment, which does treat those files as being different, so they are needed to be recognized and exported if possible.

I was using the below command:

git diff-tree -r --no-commit-id --name-only --diff-filter=ACMRT $head_commit_id $older_commit_id | xargs tar -cf project.tar -T -

However I noticed it was not exporting new/added files and also was not exporting renamed files; I then found out that git diff-tree doesn't do rename detection by default, so from what I can see I would need to add --find-renames to the command?

解决方案

As in CodeWizard's answer, you can use the "user-friendly" (or porcelain) command git diff instead of git diff-tree, which is what Git calls a plumbing command, meant for use in scripts. You should, however, be aware of what this means.

Since porcelain commands are meant for humans, they try to present things in human-readable fashion. This means they obey any setting that the one human in particular has set for himself/herself, in the various configuration files. That includes the diff.renames and diff.renameLimit configurations. They may also modify their output to make it easier for eyeballs, yet harder for computer programs, to deal with. Worst, they may change their output from one Git version to another, if people seem to prefer some default.

Since scripts are not meant for the above, they behave in predictable ways, with output that does not change, nor depend on configuration items. That way, whatever you request, you get: you will get reliable output in a reliable form, so that if you write your own reliable code, it will not just work today, for one case; it will keep working in the future, for all cases where it can.1

In the end, what this means is that if you use git diff-tree and set the right flags, you will get more reliable output. If you use git diff, your rename detection depends on:

As you discovered, the output from rename-detection is two pathnames, which is not something you can just pipe to an archiver. Archivers in general have issues with file deletion (this is, perhaps, one classic difference between archives and backups / snapshots; note that both of these are related to version control as well).

If your goal is a sort of union of all files—i.e., if the diff says that a file named A was added, one named D was deleted, and file R was created by renaming the old name O (and perhaps also modifying it: note Git's similarity index number that comes after the letter R), then you wish to collect file A, ignore file D, and collect file R while ignoring file O—well, then, what you want is to not detect renames in the first place! If you do not detect renames—which git diff-tree does not by default—this same diff will be presented as: add file A, delete file D, delete file O, and add file R. So a git diff-tree with a diff-filter that includes AM and excludes D suffices. It is less clear what to do with T, which is for a type-change: from ordinary file to symbolic link, for instance, or from file to sub-repository commit hash (what Git calls a gitlink entry, for a submodule).

Similarly, you don't want to enable copy detection: a C status, like R, presents a similarity index and a pair of pathnames. If you leave it disabled, you simply get the new pathname as an Added file.

Even if you do all this, you are still set up for a pitfall. Suppose that commit hash C1 has a file named problem, and a (presumably later) commit hash C2 has instead two files named problem/A and problem/B. This implies that the original file problem was deleted between these two points, because most systems (including Git itself) forbid having both a file named problem and a directory named problem holding various files. Given that each tar-archive itself is not a complete snapshot—you omit files that are unmodified between C1 and C2—your procedure for extracting these snapshots must necessarily be additive: extract earlier snapshot, then extract later snapshot atop earlier snapshot. This process will fail at the point where file problem is in the way of creating directory problem. Obviously, you can check for such problems and remove the problematic file (you can see now why I named the file problem :-) ), but more generally, since you are not storing "delete" directives in the first place, you won't know, in a future case where you are using these archives to rebuild a snapshot, that some files don't belong in that snapshot at all.

(The classic solution to this problem is to prefix update-archives with some kind of manifest or directive. If you decide to use such a solution, then, depending on the kind of detail you want in the manifest-or-directive, you might want to do a first pass to detect exact renames and/or exact copies.)


1Obviously, newly added features can present problems for everyone, not just scripts and not just humans, but the Git folks do work hard on not creating unnecessary problems for scripts that rely on plumbing commands. Consider, for instance, the new impetus to push Git toward using some flavor of SHA-256 instead of, or in addition to, SHA-1. Since SHA-1 produces 160-bit hashes, and SHA-256 produces 256 bit hashes, these must be represented as 40 and 64 hexadecimal digits respectively. Linus suggested abbreviating 256-bit hashes to 40 characters by default, to help out existing scripts that assume 40 characters, but I foresee some problems... :-)

这篇关于需要在两个Git comits之间获取所有文件差异(添加,修改,重命名)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆