为什么预提交钩子让文件“半阶段"? [英] Why do pre-commit hooks leave files "half staged"?

查看:31
本文介绍了为什么预提交钩子让文件“半阶段"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为格式化代码设置一个预提交挂钩,这将格式化文件并在提交中包含更改.有几个脚本说他们这样做,但我试过的那些有同样的问题:他们让文件半阶段".

例如参见这个脚本.它在修改文件后正确添加文件,并表示它应该可以在 Windows 上运行.当钩子为其他人工作时,钩子对我不起作用这一事实让我相信我的环境出了问题.

当钩子用多余的换行符修改文件时会发生这种情况:

$ git status -s一个 src/hello.c

$ git commit src/hello.c添加你好世界!"# 请输入更改的提交消息.行开始# with '#' 将被忽略,空消息会中止提交.## 在分支 master 上# 要提交的更改:# 新文件:src/hello.c## 未为提交而暂存的更改:# 修改:src/hello.c#

$ git status在分支主要提交的更改:(使用git reset HEAD <file>..."取消暂存)修改:src/hello.c未为提交而暂存的更改:(使用git add ..."来更新将提交的内容)(使用git checkout -- <file>..."放弃工作目录中的更改)修改:src/hello.c

$ git diff警告: LF 将被 src/hello.c 中的 CRLF 替换.该文件将在您的工作目录中以原始行结尾diff --git a/src/hello.c b/src/hello.c索引 5e4b595..768d31a 100644--- a/src/hello.c+++ b/src/hello.c@@ -1,6 +1,5 @@#include ——int main() {printf("你好,世界!");返回0;

$ git diff --stageddiff --git a/src/hello.c b/src/hello.c索引 768d31a..5e4b595 100644--- a/src/hello.c+++ b/src/hello.c@@ -1,5 +1,6 @@#include +int main() {printf("你好,世界!");返回0;

我希望钩子留下一个干净的索引.相反,它使文件暂存而不修改,而且文件本身也被修改.为什么会出现这种行为,我该如何让它停止?

解决方案

警告:这个答案有点长,但那是因为它确实是关于这种预提交钩子的所有陷阱.有几种,复杂的情况下会变得复杂.

<小时>

您没有直接显示钩子,但您确实有一个指向 GitHub 存储库链接的链接 包含钩子;这是一个更直接的钩子链接).我将引用钩子中的几行.

这个钩子做了一些相当鲁莽的假设,因为当你运行 git commit 时,至少有 三个 我喜欢称之为每个文件的活动副本",而且这个钩子不够复杂,无法注意到它们之间的差异.

文件的三个副本,有时内容不同

三个副本是:

  • 当前或 HEAD 提交中的已提交副本.该文件实际上无法更改 - 它一直处于冻结状态 - 但它很重要,因为它是我们用于比较的基础.

  • 索引 副本.这个文件可以改变.这就是您提议提交的内容:如果您的预提交和提交消息挂钩允许提交并且其他一切正常,则索引中的文件副本就是将被提交的副本.因此,您可以将索引(Git 也称为暂存区)看作是提议的下一次提交.

    前两个文件——冻结的 HEAD 副本和索引副本——采用一种特殊的、仅限 Git 的压缩格式.虽然可以更改索引副本,但总是通过替换来完成,通常使用 git add 来覆盖它.git add 命令将文件压缩为仅限 Git 的格式,并将压缩副本(从技术上讲,是对压缩副本的引用)放入索引中.

  • 工作树副本.此文件是一个普通文件,您可以查看和操作.

现在,您正在使用 Git 的 LF/CRLF 转换,如下所示:

<块引用>

warning: LF 将被 src/hello.c 中的 CRLF 替换

实际翻译发生在 Git 将文件从工作树复制到索引时——即在 git add 期间——或者当它从索引复制文件到工作树时,例如, 在 git checkout 期间.提取到工作树步骤将 LF-only 行尾更改为 CRLF 行尾;add-to-index 步骤将 CRLF 行结尾更改为 LF-only 行结尾.(你可以控制它并稍微改变它,但这是通常的方案.)

git statusgit add 和现有的钩子

现在让我们转到脚本,看看几行:

<块引用>

 for line in $(git status -s)

(从技术上讲这应该是 git status --porcelain,但目前他们几乎做同样的事情:主要的危险是 --short 输出可以着色,这会破坏下一位)

<块引用>

 if [[ $line == A* ||$line == M* ]]

现在是时候考虑 git status 打印什么了.文档说,关于短格式:

<块引用>

...每个路径的状态显示为这些形式之一

 XY 路径XY ORIG_PATH ->小路

ORIG_PATH 是重命名/复制内容的来源.ORIG_PATH 仅在重命名或复制条目时显示.XY 是两个字母的状态代码.[截图] X显示索引的状态,Y 显示工作树的状态.

(另外:copyed 目前不是 git status 的可能状态.git status 调用的内部差异引擎可以设置这个,但要这样做,调用者必须启用它,而 git status 只是没有.如果 git status 获得了启用复制检测的新命令行标志或配置条目,你可以得到 C status-es,但现在你不能.)

这里的关键是第一个字母,这是脚本在这里测试的,基于索引的状态.也就是说,它是将 HEAD 提交与索引(与建议提交)进行比较的结果的总结.如果文件在索引中是新的(没有出现在 HEAD 提交中),则该文件将被 Added,或者如果它在两者中都出现 Modified索引HEAD提交,但索引副本不同于HEAD提交.

这里要意识到的是,无论 index 副本是否与 head 副本匹配,work-tree 副本都是第三个文件完整.它可能与另外两个副本中的一个或两个完全不同!没关系,事实上,如果您使用 git add -p 选择性地仅暂存工作树文件的部分,这是故意的.在我们继续耕作时记住这一点.

现在让我们回到预提交钩子脚本:

<块引用>

 if [[ $line == *.c ||$line == *.cc ||$line == *.h ||$line == *.cpp ]]然后#格式化文件clang-format -i -style=file $(pwd)/${line:3}# 然后添加文件(以便提交任何格式更改)git add $(pwd)/${line:3}菲

如果行尾的文件名——对于AM 状态文件来说只是一个文件名;只有 R 状态文件有两个名称;但是脚本在不检查 R 状态文件时出错,因为文件可以被重命名 修改 - 以 .c, 结尾>.cc 等,这运行 clang-format.

clang-format输入work-tree 文件.输入几乎肯定应该是文件的索引副本,但它不是.所以脚本假设索引和工作树副本匹配.

运行 clang-format 后,脚本然后运行 ​​git add 将(更新的)工作树文件复制回索引.如果我们想正确地做到这一点,我们需要格式化索引副本,然后添加格式化的索引副本,这非常棘手.这可能是为什么脚本有点懒惰的原因,但绝对值得一提.

clang-format 编写的工作树文件可能只有 LF 行尾(参见 https://reviews.llvm.org/D19031).这与警告的文本​​相符:

<块引用>

警告:src/hello.c 中的 LF 将被 CRLF 替换.

这告诉您当前的工作树副本 src/hello.c 仅具有 LF 行结尾.Git 被告知,当 Git 从索引复制回工作树时,Git 应该将 LF-only 结尾更改为 CRLF 结尾.

三份以上

现在事情变得复杂了.我在上面提到每个文件至少有三个副本,然后描述了这三个副本所在的位置.有 HEAD 提交、索引和工作树.这种描述的一个缺陷是短语the index,因为Git 有时会使用临时 索引.一些 git commit 命令就是这种情况,但不是所有的.

git commit 的全部内容是它总是从 一个 索引构建你的新提交,但不一定从 索引构建.有一个the"索引——一个与工作树相关的特定的、独特的索引.1 然后是额外的索引文件,一些 Git 命令为各种目的创建——例如,git stash 创建一个临时索引来保存工作树,而 git filter-branch 在运行时创建了许多临时索引文件.不过,在这里,我们对 git commit 感兴趣,而 git commit 有时会创建一两个自己的临时索引文件.

如果你运行 git commit——根本没有额外的参数——git commit 只使用 索引文件.那是您提议的提交,其中已经包含所有文件.如果您的预提交钩子运行 git add,它会将新文件复制到 索引中,替换索引中的旧文件,并最终 git commit 使用新文件写出新提交.如果新文件来自工作树,则大部分内容都匹配,除了 CRLF 行结尾.

但是如果你运行 git commit --onlygit commit --include,或者甚至只是 git commit -a,Git 需要一个转折.例如,如果您运行 git commit file1.cc,则 意味着 git commit --only file1.cc,除非您添加 --include 在这种情况下它意味着 git commit --include file1.cc.

为了完成这些操作——实际上包括普通的git commit——Git 至少制作一个临时索引文件,尽管对于普通的git commit,这会发生得越晚越好.一个临时索引文件名为 index.lock(好吧,.git/index.lock,取决于你的 .git 目录在哪里).这个临时索引将是新提交文件的真正来源.提交完成后,如果全部成功,Git 通过将 .git/index.lock 重命名为 .git/index 来释放锁.

我们可以通过一个虚拟的 .git/hooks/pre-commit 看到这些操作,它只打印环境变量 $GIT_INDEX_FILE 的名称,然后退出未能阻止提交:

$ cat .git/hooks/pre-commit$ git提交$GIT_INDEX_FILE 是 .git/index$ git commit -a$GIT_INDEX_FILE 是 [路径]/git/.git/index.lock$ git commit --only cache.h$GIT_INDEX_FILE 是 [path]/.git/next-index-53061.lock$ git commit --include cache.h$GIT_INDEX_FILE 是 [path]/.git/index.lock

所以:

  • 普通的 git commit 使用常规索引文件.如果您的钩子运行 git add,您将替换索引中的文件.当 Git 开始创建锁定文件 index.lock 时,它会创建它from index,当 git commit> 完成(假设成功),您的钩子对索引所做的更改将生效.

  • git commit -agit commit --include 的工作原理类似.锁定是更早的,但是 git add 应该更新 index.lock 到位,当 git commit 完成时,你的主索引应该有更新.(我没有测试过这个,但看起来很明显.)

  • 但是 git commit --only 生成临时索引 (next-index-53061.lock) 并锁定主索引和 git add--only 文件添加到主锁定索引.提交完成后,新提交的文件将是临时索引中的文件,包括您更新的任何内容;但是 main 索引将来自 index.lock,这是更新了特定文件的旧索引.更新时,它们将控制该索引中的实际内容.

<小时>

1当你使用 git worktree add 创建一个额外的工作树时,额外的工作树有自己的单数索引,所以 em> 索引是与 工作树配对的索引:添加的工作树是具有单独索引的单独工作树.添加的工作树也有自己的 HEAD,这使得 Windows 上的事情特别复杂,但我们不需要去那里.

<小时>

结论

这些是在提交钩子中需要注意的陷阱.所有这一切的含义是,除非您想深入了解 Git 本身的内部结构——例如检查 $GIT_INDEX_FILE 的名称,和/或向多个索引文件添加内容——让正在进行的提交挂钩修改通常是个坏主意.相反,检查正在进行的提交通常更明智.如果提交是好的,让它继续.如果没有,提醒用户运行任何需要的东西,并让提交失败.

可以修改正在进行的提交;你只需要注意这些奇怪的情况.

I'm trying to set up a pre-commit hook for formatting code, that would format files and include changes in the commit. Several scripts that say they do this, but the ones I tried have the same problem: they leave files "half staged".

See for example this script. It properly adds files after modifying them, and says it should work on Windows. The fact that hooks don't work for me when they work for other people leads me to believe that something is up with my environment.

This happens when the hook modifies a file with a superfluous line break:

$ git status -s
A  src/hello.c

$ git commit src/hello.c

Add 'Hello World!'
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Changes to be committed:
#   new file:   src/hello.c
#
# Changes not staged for commit:
#   modified:   src/hello.c
#

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   src/hello.c

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   src/hello.c

$ git diff
warning: LF will be replaced by CRLF in src/hello.c.
The file will have its original line endings in your working directory
diff --git a/src/hello.c b/src/hello.c
index 5e4b595..768d31a 100644
--- a/src/hello.c
+++ b/src/hello.c
@@ -1,6 +1,5 @@
 #include <stdio.h>

-
 int main() {
   printf("Hello, World!");
   return 0;

$ git diff --staged
diff --git a/src/hello.c b/src/hello.c
index 768d31a..5e4b595 100644
--- a/src/hello.c
+++ b/src/hello.c
@@ -1,5 +1,6 @@
 #include <stdio.h>

+
 int main() {
   printf("Hello, World!");
   return 0;

I would have expected the hook to leave a clean index. Instead it leaves the file staged without modification, but also leaving the file itself modified. Why does this behavior occur and how can I make it stop?

解决方案

Warning: this answer is kind of long, but that's because it's really about all the pitfalls of this sort of pre-commit hook. There are several, and it gets complicated in complex cases.


You didn't show the hook directly but you did have a link to the link to the GitHub repository containing the hook; here's a more direct link to the hook itself). I will quote a few lines from the hook.

The hook makes some rather brash assumptions, because when you run git commit, there are at least three of what I like to call "active copies" of each file, and this hook is not sophisticated enough to notice discrepancies between them.

Three copies of files, sometimes with different content

The three copies are:

  • The committed copy in the current or HEAD commit. This file literally cannot be changed—it's frozen for all time—but it is important because it's the basis we will use for comparisons.

  • The index copy. This file can be changed. It's what you are proposing to commit: if your pre-commit and commit-message hooks permit the commit and all else goes right, the copy of the file that's in the index is the copy that will be committed. Hence, you can think of the index—which Git also calls the staging area—as, essentially, the proposed next commit.

    These first two files—frozen HEAD copy, and index copy—are in a special, Git-only, compressed format. While the index copy can be changed, that's always done by replacing it, typically using git add to overwrite it. The git add command compresses a file into the Git-only format and places the compressed copy—well, technically, a reference to the compressed copy—into the index.

  • The work-tree copy. This file is an ordinary file that you can see and manipulate.

Now, you're using Git's LF/CRLF translation, as indicated by:

warning: LF will be replaced by CRLF in src/hello.c

The actual translation happens when Git copies the file from the work-tree to the index—i.e., during git add—or when it copies the file from the index to the work-tree, e.g., during git checkout. The extract-to-work-tree step changes LF-only line endings to CRLF line endings; the add-to-index step changes CRLF line endings to LF-only line endings. (You can control this and change it around somewhat, but that's the usual scheme.)

git status, git add, and the existing hook

Let's go to the script now, and look at a few lines:

for line in $(git status -s)

(technically this should be git status --porcelain, but at the moment they pretty much do the same thing: the main danger is that the --short output could be colorized, which would break the next bit)

  if [[ $line == A* || $line == M* ]]

Now it's time to consider what git status prints. The documentation says, about the short format:

... the status of each path is shown as one of these forms

   XY PATH
   XY ORIG_PATH -> PATH

where ORIG_PATH is where the renamed/copied contents came from. ORIG_PATH is only shown when the entry is renamed or copied. The XY is a two-letter status code. [snippage] X shows the status of the index, and Y shows the status of the work tree.

(Aside: copied is not currently a possible status for git status. The internal diff engine that git status invokes can set this, but to do so, the caller has to enable it, and git status just doesn't. If git status got new command-line flags or configuration entries that enabled copy detection, you could get C status-es, but as of now you cannot.)

The key item here is that the first letter, which is what the script is testing here, is based on the status of the index. That is, it's a summary of the result of comparing the HEAD commit to the index—to the proposed commit. A file will be Added if it's new in the index (does not appear in the HEAD commit), or Modified if it's in both the index and the HEAD commit, but the index copy is different from the HEAD commit.

The thing to realize here is that whether or not the index copy matches the head copy, the work-tree copy is a third file entirely. It might be quite different from one or both of these other two copies! That's OK, and in fact, that's deliberately the case if you use git add -p to selectively stage only part of the work-tree file. Just keep it in mind as we plow on.

Now let's go back to the pre-commit hook script:

    if [[ $line == *.c || $line == *.cc || $line == *.h || $line == *.cpp ]]
    then
      # format the file
      clang-format -i -style=file $(pwd)/${line:3}

       # and then add the file (so that any formatting changes get committed)
      git add $(pwd)/${line:3}
    fi

If the file name at the end of the line—which for A and M status files is just one file name; only R status files would have two names; but the script is faulty in not checking for R status files, since a file could be renamed and modified—ends in .c, .cc, etc., this runs clang-format.

The input to clang-format is the work-tree file. The input almost certainly should be the index copy of the file, but it isn't. So the script assumes that the index and work-tree copy match.

Having run clang-format, the script then runs git add to copy the (updated) work-tree file back into the index. If we wanted to do this right, we'd need to format the index copy and then add the formatted index copy, which is pretty tricky. That's probably why the script is a little lazy, but it's definitely worth noting.

The work-tree file written by clang-format is probably going to have LF-only line endings (see https://reviews.llvm.org/D19031). This fits with the text of the warning:

warning: LF will be replaced by CRLF in src/hello.c.

This is telling you that the current work-tree copy, src/hello.c, has LF-only line endings. Git has been told that when Git copies from index back to work-tree, Git should change LF-only endings to CRLF endings.

More than three copies

Now things get complicated. I mentioned above that there are at least three copies of each file, and then described the places these three copies live. There's the HEAD commit, the index, and the work-tree. The one flaw with this description is the phrase the index, as Git will sometimes use a temporary index. That's the case for some git commit commands, but not for all of them.

The full story of git commit is that it always builds your new commit from an index, but not necessarily from the index. There is a "the" index—one particular, distinguished index that goes with the work-tree.1 Then there are extra index files, that some Git commands create for various purposes—e.g., git stash creates a temporary index to save the work-tree, and git filter-branch creates lots of temporary index files as it runs. Here, though, we're interested in git commit, and git commit will sometimes create one or two of its own temporary index files.

If you run git commit—with no extra arguments at all—git commit just uses the index file. That's your proposed commit, and it already has all the files in it. If your pre-commit hook runs git add, it copies new files into the index, displacing the old ones that were in the index, and eventually git commit writes out the new commit using the new files. If the new files came from the work-tree, things mostly match up, except maybe for CRLF line endings.

But if you run git commit --only or git commit --include, or even just git commit -a, Git takes a twist. If you run git commit file1.cc, that means git commit --only file1.cc, for instance, unless you add --include in which case it means git commit --include file1.cc.

To do these operations—actually including plain git commit—Git makes at least one temporary index file, although for plain git commit this happens as late as possible. One temporary index file is named index.lock (well, .git/index.lock, depending on where your .git directory is). This temporary index will be the true source of files for the new commit. When the commit is finished, if it all succeeds, Git releases the lock by renaming .git/index.lock to be .git/index.

We can see these in action through a dummy .git/hooks/pre-commit that just prints the name of the environment variable $GIT_INDEX_FILE, then exits with a failure to prevent the commit:

$ cat .git/hooks/pre-commit
$ git commit
$GIT_INDEX_FILE is .git/index
$ git commit -a
$GIT_INDEX_FILE is [path]/git/.git/index.lock
$ git commit --only cache.h
$GIT_INDEX_FILE is [path]/.git/next-index-53061.lock
$ git commit --include cache.h
$GIT_INDEX_FILE is [path]/.git/index.lock

So:

  • A plain git commit uses the regular index file. If your hook runs git add you'll replace files in the index. When Git gets around to creating the lock file index.lock, it creates it from index, and when git commit finishes (assuming success), your changes to the index, made by your hook, will take effect.

  • A git commit -a or git commit --include works similarly. The lock is made earlier, but git add should update the index.lock in place, and when git commit finishes, your main index should have the updates. (I have not tested this but it seems obvious.)

  • But git commit --only makes a temporary index (next-index-53061.lock) as well as locking the main index and git adding the --only files to the main locked index. When the commit finishes, the new commit's files will be those from the temporary index, including anything you updated; but the main index will come from index.lock, which is the old index with the specific files updated. When they got updated will control what is actually in that index.


1When you use git worktree add to create an additional work-tree, the extra work-tree gets its own singular index, so the index is the one that is paired with the work-tree: an added work-tree is a separate work-tree with a separate index. The added work-tree also gets its own HEAD, which makes things especially complicated on Windows, but we don't need to go there.


Conclusion

These are the pitfalls to be aware of in commit hooks. The implication of all of this is that, unless you want to get intimate with the innards of Git itself—of checking the name of $GIT_INDEX_FILE, for instance, and/or adding things to multiple index files—it's usually a bad idea to have hook modify the commit in progress. Instead, it's usually wiser to check the commit-in-progress. If the commit is good, let it proceed. If not, remind the user to run whatever is required, and have the commit fail.

You can modify the commit-in-progress; you just have to be aware of these weird cases.

这篇关于为什么预提交钩子让文件“半阶段"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆