如何在不修改 git 历史记录的情况下在我的源代码上运行代码格式化程序? [英] How do I run a code formatter over my source without modifying git history?

查看:11
本文介绍了如何在不修改 git 历史记录的情况下在我的源代码上运行代码格式化程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用代码格式化工具格式化整个 repo.这样做时,我想保留有关谁提交了哪一行的信息,以便像 git blame 这样的命令仍然显示正确的信息.我的意思是它应该显示之前编辑过每一行的作者(在格式化之前).

I am trying to format an entire repo using a code formatter tool. In doing so, I want to keep information about who committed which line, so that commands like git blame still show the correct information. By this, I mean it should show the author that previously edited each line (before it was formatted).

有一个 git filter-branch 命令,它允许您从一开始就对 repo 的每个修订版运行一个命令.

There is the git filter-branch command which allows you to run a command against each revision of the repo starting from the beginning of time.

git filter-branch --tree-filter '
  npx prettier --write "src/main/web/app/**/**.{js, jsx}" || 
  echo "Error: no JS files found or invalid syntax"' 
  -- --all

运行它需要很长时间,而且我真的不在乎过去.我只想在不更改每一行的所有权的情况下格式化主分支.我怎样才能做到这一点?我尝试在末尾使用 rev-list 和其他过滤器类型,但它仍然不起作用.必须有一种方法来格式化代码库,同时保留每一行的作者信息.

It will take forever to run this and really I don't care about the past. I just want to format the master branch going forward without changing ownership of each line. How can I do this? I tried playing with the rev-list at the end and other filter types but it still doesn't work. There must be a way to format the codebase while preserving the author information for each line.

推荐答案

你想做的事是不可能的.您不能在某个时间点更改一行代码,但让 git 报告该行代码的最新更改是在该时间点之前发生的.

What you are trying to do is impossible. You cannot, at some point in time, change a line of code, and yet have git report that the most recent change to that line of code is something that happened before that point in time.

我想源代码控制工具可以支持不重要的更改"的想法,您将提交标记为修饰,然后历史分析将跳过该提交.我不确定该工具将如何验证更改确实是装饰性的,并且如果没有某种形式的工具强制执行,该功能肯定会被滥用,导致错误介绍可能隐藏在不重要的"提交中.但实际上,我认为这是一个坏主意的原因是学术性的——底线是,git 没有这样的功能.(我也想不出有什么源代码控制工具可以做到.)

I suppose a source control tool could support the idea of an "unimportant change", where you mark a commit as cosmetic and then history analysis would skip over that commit. I'm not sure how the tool would verify that the change really was cosmetic, and without some form of tool enforcement the feature would assuredly be misused resulting in bug introductions potentially being hidden in "unimportant" commits. But really the reasons I think it's a bad idea are academic here - the bottom line is, git doesn't have such a feature. (Nor can I think of any source control tool that does.)

您可以更改格式.您可以保留过去更改的可见性.您可以避免编辑历史记录.但是你不能同时做这三个,所以你必须决定牺牲哪一个.

You can change the formatting going forward. You can preserve the visibility of past changes. You can avoid editing history. But you cannot do all three at the same time, so you're going to have to decide which one to sacrifice.

顺便说一句,历史重写实际上有几个缺点.您提到了处理时间,所以让我们先来看看:

There are actually a couple down-sides to the history rewrite, by the way. You mentioned processing time, so let's look at that first:

正如您所指出的,使用 filter-branch 执行此操作的直接方法将非常耗时.你可以做一些事情来加速它(比如给它一个 ramdisk 作为它的工作树),但它是一个 tree-filter 并且它涉及每个文件的每个版本的处理.

As you've noted, the straightforward way to do this with filter-branch would be very time consuming. There are things you can do to speed it up (like giving it a ramdisk for its working tree), but it's a tree-filter and it involves processing of each version of each file.

如果您进行了一些预处理,您可能会更有效率.例如,您可能能够预处理数据库中的每个 BLOB 并创建一个映射(其中 TREE 包含 BLOB X,将其替换为BLOB Y),然后使用 index-filter 执行替换.这样可以避免所有的签出和添加操作,并且可以避免重复重新格式化相同的代码文件.这样可以节省大量 I/O.但设置起来并不简单,而且仍然可能很耗时.

If you did some pre-processing, you could be somewhat more efficient. For example, you might be able to preprocess every BLOB in the database and create a mapping (where a TREE contains BLOB X, replace it with BLOB Y), and then use an index-filter to perform the substitutions. This would avoid all the checkout and add operations, and it would avoid repeatedly re-formatting the same code files. So that saves a lot of I/O. But it's a non-trivial thing to set up, and still might be time consuming.

(可以根据同样的原理编写更专业的工具,但 AFAIK 没有人写过.有先例,更专业的工具可以比 filter-branch 更快...)

(It's possible to write a more specialized tool based on this same principle, but AFAIK nobody has written one. There is precedent that more specialized tools can be faster than filter-branch...)

即使您找到了一个运行速度足够快的解决方案,请记住,历史记录重写会干扰您的所有 refs.像任何历史重写一样,repo 的所有用户都必须更新他们的克隆 - 对于这种彻底的事情,我建议这样做的方式是,在开始重写之前将克隆扔掉,然后重新克隆.

Even if you come to a solution that will run fast enough, bear in mind that the history rewrite will disturb all of your refs. Like any history rewrite, it will be necessary for all users of the repo to update their clones - and for something this sweeping, the way I recommend to do that is, throw the clones out before you start the rewrite and re-clone afterward.

这也意味着如果你有任何依赖于提交 ID 的东西,它也会被破坏.(这可能包括构建基础架构或发布文档等;取决于您的项目实践.)

That also means if you have anything that depends on commit ID's, that will also be broken. (That could include build infrastructure, or release documentation, etc.; depending on your project's practices.)

因此,历史重写是一个非常激进的解决方案.另一方面,假设格式化代码是不可能的,因为它不是从第一天开始就完成的,这似乎也很激烈.所以我的建议:

So, a history rewrite is a pretty drastic solution. And on the other hand, it also seems drastic to suppose that formatting the code is impossible simply because it wasn't done from day 1. So my advice:

在新的提交中重新格式化.如果您需要使用 git blame,它会将您指向发生重新格式化的提交,然后在重新格式化提交的父级上再次运行 git blame.

Do the reformatting in a new commit. If you need to use git blame, and it points you to the commit where reformatting occurred, then follow up by running git blame again on the reformat commit's parent.

是的,很糟糕.一阵子.但是,随着时间的流逝,一段特定的历史往往会变得不那么重要,所以从那里你只是让问题逐渐减少到过去.

Yeah, it sucks. For a while. But a given piece of history tends to become less important as it ages, so from there you just let the problem gradually diminish into the past.

这篇关于如何在不修改 git 历史记录的情况下在我的源代码上运行代码格式化程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆