折叠 git 存储库的历史记录 [英] Collapsing a git repository's history

查看:21
本文介绍了折叠 git 存储库的历史记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个历史悠久的 git 项目.

We have a git project which has quite a big history.

具体来说,在项目早期,项目中有相当多的二进制资源文件,这些现在已被删除,因为它们是有效的外部资源.

Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they're effectively external resources.

但是,由于之前提交了这些文件,我们的存储库的大小 >200MB(目前总检出约 20MB).

However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.

我们想要做的是折叠"历史记录,以便存储库看起来是从比它更晚的修订版创建的.例如

What we'd like to do is "collapse" the history so that the repository appears to have been created from a later revision than it was. For example

1-----2-----3-----4-----+---+---+
                          /
                    +-----+---+---+

  1. 已创建存储库
  2. 添加了大量二进制文件
  3. 删除了大量二进制文件
  4. 新的存储库开始"

因此,我们希望在某个时间点之前丢失项目历史记录.此时只有一个分支,因此尝试处理多个起点等并不复杂.但是我们不想丢失所有历史记录并使用当前版本启动一个新存储库.

So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there's no complication with trying to deal with multiple start points etc. However we don't want to lose all of the history and start a new repository with the current version.

这是可能的,还是我们注定永远拥有一个臃肿的存储库?

Is this possible, or are we doomed to have a bloated repository forever?

推荐答案

您可以消除二进制膨胀并保留其余的历史记录.Git 允许您重新排序和压缩"先前的提交,因此您可以仅组合添加和删除大二进制文件的提交.如果所有添加都在一次提交中完成,而在另一个提交中完成删除,这将比处理每个文件容易得多.

You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and 'squash' prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.

$ git log --stat       # list all commits and commit messages 

在此搜索添加和删除二进制文件的提交并记下它们的 SHA1,例如 2bcdef3cdef3.

Search this for the commits that add and delete your binary files and note their SHA1s, say 2bcdef and 3cdef3.

然后要编辑存储库的历史记录,请使用带有交互选项的 rebase -i 命令,从添加二进制文件的提交的父级开始.它将启动您的 $EDITOR,您将看到以 2bcdef 开头的提交列表:

Then to edit the repo's history, use rebase -i command with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you'll see a list of commits starting with 2bcdef:

$ git rebase -i 2bcdef^    # generate a pick list of all commits starting with 2bcdef
# Rebasing zzzzzz onto yyyyyyy 
# 
# Commands: 
#  pick = use commit 
#  edit = use commit, but stop for amending 
#  squash = use commit, but meld into previous commit 
# 
# If you remove a line here THAT COMMIT WILL BE LOST.
#
pick 2bcdef   Add binary files and other edits
pick xxxxxx   Another change
  .
  .
pick 3cdef3   Remove binary files; link to them as external resources
  .
  .

插入 squash 3cdef3 作为第二行,并从列表中删除 pick 3cdef3 行.您现在有一个交互式 rebase 的操作列表,它将添加和删除二进制文件的提交合并到一个提交中,其差异只是这些提交中的任何其他更改.然后,当您告诉它完成时,它将按顺序重新应用所有后续提交:

Insert squash 3cdef3 as the second line and remove the line which says pick 3cdef3 from the list. You now have a list of actions for the interactive rebase which will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:

$ git rebase --continue

这将需要一两分钟.
您现在拥有一个不再有二进制文件来来去去的存储库.但是它们仍然会占用空间,因为在默认情况下,Git 会将更改保留 30 天,然后它们才能被垃圾收集,以便您可以改变主意.如果您现在想删除它们:

This will take a minute or two.
You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind. If you want to remove them now:

$ git reflog expire --expire=1.minute refs/heads/master
      #all deletions up to 1 minute  ago available to be garbage-collected
$ git fsck --unreachable      # lists all the blobs(files) that will be garbage-collected
$ git prune
$ git gc                      

现在您已经消除了膨胀,但保留了其余的历史记录.

Now you've removed the bloat but kept the rest of your history.

这篇关于折叠 git 存储库的历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆