如何在不修改git历史记录的情况下在源代码上运行代码格式化程序? [英] How do I run a code formatter over my source without modifying git history?

查看:144
本文介绍了如何在不修改git历史记录的情况下在源代码上运行代码格式化程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用代码格式化工具来格式化整个仓库.这样做时,我想保留有关谁提交了哪一行的信息,以便git blame之类的命令仍显示正确的信息.这样,我的意思是它应该向作者显示以前编辑过的每一行(在格式化之前).

I am trying to format an entire repo using a code formatter tool. In doing so, I want to keep information about who committed which line, so that commands like git blame still show the correct information. By this, I mean it should show the author that previously edited each line (before it was formatted).

有一个git filter-branch命令,它允许您从时间开始对仓库的每个修订版运行一个命令.

There is the git filter-branch command which allows you to run a command against each revision of the repo starting from the beginning of time.

git filter-branch --tree-filter '\
  npx prettier --write "src/main/web/app/**/**.{js, jsx}" || \
  echo "Error: no JS files found or invalid syntax"' \
  -- --all

运行此操作将需要永远,而且我真的不在乎过去.我只想格式化master分支,而不更改每一行的所有权.我怎样才能做到这一点?我尝试在最后使用rev-list和其他过滤器类型,但仍然无法正常工作.在保留每一行的作者信息的同时,必须有一种格式化代码库的方法.

It will take forever to run this and really I don't care about the past. I just want to format the master branch going forward without changing ownership of each line. How can I do this? I tried playing with the rev-list at the end and other filter types but it still doesn't work. There must be a way to format the codebase while preserving the author information for each line.

推荐答案

您要尝试执行的操作是不可能的.您无法在某个时间点更改一行代码,但是git报告说对该行代码的最新更改是在该时间点之前发生的.

What you are trying to do is impossible. You cannot, at some point in time, change a line of code, and yet have git report that the most recent change to that line of code is something that happened before that point in time.

我想一个源代码控制工具可以支持不重要的更改"的想法,在这里您将提交标记为修饰,然后历史分析将跳过该提交.我不确定该工具如何验证更改是否确实是表面上的,如果没有某种形式的工具强制实施,该功能肯定会被滥用,从而导致错误引入可能隐藏在不重要的"提交中.但是,实际上,我认为这是个坏主意的原因在这里是学术上的-最重要的是,git没有这样的功能. (我也想不出能做到的任何源代码控制工具.)

I suppose a source control tool could support the idea of an "unimportant change", where you mark a commit as cosmetic and then history analysis would skip over that commit. I'm not sure how the tool would verify that the change really was cosmetic, and without some form of tool enforcement the feature would assuredly be misused resulting in bug introductions potentially being hidden in "unimportant" commits. But really the reasons I think it's a bad idea are academic here - the bottom line is, git doesn't have such a feature. (Nor can I think of any source control tool that does.)

您可以更改格式.您可以保留过去更改的可见性.您可以避免编辑历史记录.但是您不能同时做这三个,所以您将不得不决定要牺牲哪一个.

You can change the formatting going forward. You can preserve the visibility of past changes. You can avoid editing history. But you cannot do all three at the same time, so you're going to have to decide which one to sacrifice.

顺便说一句,实际上重写历史记录有两个缺点.您提到处理时间,所以让我们先来看一下:

There are actually a couple down-sides to the history rewrite, by the way. You mentioned processing time, so let's look at that first:

正如您已经指出的,用filter-branch做到这一点的直接方法将非常耗时.您可以采取一些措施来加快它的速度(例如为其工作树提供一个ramdisk),但这是一个tree-filter,涉及处理每个文件的每个版本.

As you've noted, the straightforward way to do this with filter-branch would be very time consuming. There are things you can do to speed it up (like giving it a ramdisk for its working tree), but it's a tree-filter and it involves processing of each version of each file.

如果进行了一些预处理,则效率可能会更高.例如,您可能能够预处理数据库中的每个BLOB并创建映射(其中TREE包含BLOB X,将其替换为BLOB Y),然后使用index-filter执行替换.这将避免所有检出和添加操作,并且将避免重复重新格式化相同的代码文件.这样可以节省大量的I/O.但这并不是一件容易的事,而且仍然很耗时.

If you did some pre-processing, you could be somewhat more efficient. For example, you might be able to preprocess every BLOB in the database and create a mapping (where a TREE contains BLOB X, replace it with BLOB Y), and then use an index-filter to perform the substitutions. This would avoid all the checkout and add operations, and it would avoid repeatedly re-formatting the same code files. So that saves a lot of I/O. But it's a non-trivial thing to set up, and still might be time consuming.

(可以基于相同的原理编写更专业的工具,但是AFAIK却没有人编写过.有先例,更专业的工具可以比filter-branch ...更快.)

(It's possible to write a more specialized tool based on this same principle, but AFAIK nobody has written one. There is precedent that more specialized tools can be faster than filter-branch...)

即使您得出的解决方案运行速度足够快,也请记住,历史记录重写会干扰您的所有引用.像任何历史记录重写一样,回购协议的所有用户都必须更新其克隆-对于这种全面的操作,我建议这样做的方法是,在开始重写之前先丢弃克隆,然后再重新克隆.

Even if you come to a solution that will run fast enough, bear in mind that the history rewrite will disturb all of your refs. Like any history rewrite, it will be necessary for all users of the repo to update their clones - and for something this sweeping, the way I recommend to do that is, throw the clones out before you start the rewrite and re-clone afterward.

这也意味着,如果您有任何依赖于提交ID的内容,那也将被破坏. (这可能包括构建基础结构或发布文档等;具体取决于您的项目的实践.)

That also means if you have anything that depends on commit ID's, that will also be broken. (That could include build infrastructure, or release documentation, etc.; depending on your project's practices.)

因此,历史记录重写是一个非常严格的解决方案.另一方面,仅仅因为仅从第一天开始就没有完成格式化代码的假设,这似乎也太过激烈了.所以我的建议是:

So, a history rewrite is a pretty drastic solution. And on the other hand, it also seems drastic to suppose that formatting the code is impossible simply because it wasn't done from day 1. So my advice:

在新的提交中重新格式化.如果需要使用git blame,它将您指向重新格式化发生的提交,则在重新格式化提交的父级上再次运行git blame.

Do the reformatting in a new commit. If you need to use git blame, and it points you to the commit where reformatting occurred, then follow up by running git blame again on the reformat commit's parent.

是的,太烂了.一阵子.但是,随着时间的流逝,一段特定的历史趋于变得不那么重要,因此从那里开始,您就可以让问题逐渐消解到过去.

Yeah, it sucks. For a while. But a given piece of history tends to become less important as it ages, so from there you just let the problem gradually diminish into the past.

这篇关于如何在不修改git历史记录的情况下在源代码上运行代码格式化程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆