折叠一个git存储库的历史 [英] Collapsing a git repository's history
问题描述
具体而言,在项目的早期阶段,项目中有相当多的二进制资源文件,这些文件现在已被删除,因为它们是有效的外部资源。
但是,由于拥有这些资源,我们的资源库的大小大于200MB(总结算量目前约为20MB)先前已提交的文件。
我们想要做的是折叠历史记录,以便版本库看起来是从以后的版本创建的。例如
1 ----- 2 ----- 3 ----- 4 ----- + --- + --- +
$ p
\ /
+ ----- + --- + --- +
- 创建存储库
- 添加大量二进制文件
- 删除了大量的二进制文件
- 库的新开始'启动'
我们希望在某个点之前失去项目历史。此时只有一个分支,所以不需要处理多个起始点等问题。但是,我们不想丢失所有历史记录,并使用当前版本启动一个新的存储库。
这是可能的,还是我们注定会永远存在一个臃肿的存储库?
解决方案您可以删除二元膨胀并保留其余的历史记录。 Git允许你在提交之前重新排序和挤压,所以你可以只提交添加和删除你的大二进制文件的提交。如果这些添加全部在一个提交中完成,并且在另一个提交中完成,那么这将比处理每个文件要容易得多。
$ git log --stat#列出所有提交和提交消息
搜索此提交添加和删除你的二进制文件并记下他们的SHA1,比如
2bcdef
和3cdef3
。
然后编辑回购的历史记录,使用交互选项使用
rebase -i
命令,从添加二进制文件的提交的父节点开始。它将启动您的$ EDITOR,您将看到以2bcdef
开头的提交列表:$ git rebase -i 2bcdef ^#生成以2bcdef开头的所有提交选择列表
#将zzzzzz重新定位到yyyyyyy
#
#命令:
# pick =使用提交
#edit =使用提交,但停止修改
#squash =使用提交,但融入前一提交
#
#如果你在这里删除一行THAT提交将失败。
#
选择2bcdef添加二进制文件和其他编辑
选择xxxxxx另一个更改
。
。
选择3cdef3删除二进制文件;链接到它们作为外部资源
。
。
在第二行插入
squash 3cdef3
从列表中删除pick 3cdef3
行。您现在有一个交互式rebase
的操作列表,它将提交内容添加和删除您的二进制文件合并为一个提交,其差异就是这些提交中的任何其他更改。然后,当你告诉它完成时,它会重新申请所有后续提交:
$ git rebase --continue
这需要一两分钟时间。
您现在有一个回购更长的二进制文件来了或去。但是他们仍然会占用空间,因为默认情况下,Git会保留30天左右的变化,然后才能进行垃圾回收,这样您就可以改变主意。
如果你想现在删除它们:
$ git reflog expire --expire = 1.minute refs / heads / master
1分钟前的所有删除操作都可以被垃圾收集
$ git fsck --unreachable#列出所有将被垃圾收集的blob(文件)
$ git prune
$ git gc
现在您已经移除了膨胀但保留了其余部分历史。
We have a git project which has quite a big history.
Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they're effectively external resources.
However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.
What we'd like to do is "collapse" the history so that the repository appears to have been created from a later revision than it was. For example
1-----2-----3-----4-----+---+---+ \ / +-----+---+---+
- Repository created
- Large set of binary files added
- Large set of binary files removed
- New intended 'start' of repository
So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there's no complication with trying to deal with multiple start points etc. However we don't want to lose all of the history and start a new repository with the current version.
Is this possible, or are we doomed to have a bloated repository forever?
解决方案You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and 'squash' prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.
$ git log --stat # list all commits and commit messages
Search this for the commits that add and delete your binary files and note their SHA1s, say
2bcdef
and3cdef3
.Then to edit the repo's history, use
rebase -i
command with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you'll see a list of commits starting with2bcdef
:$ git rebase -i 2bcdef^ # generate a pick list of all commits starting with 2bcdef # Rebasing zzzzzz onto yyyyyyy # # Commands: # pick = use commit # edit = use commit, but stop for amending # squash = use commit, but meld into previous commit # # If you remove a line here THAT COMMIT WILL BE LOST. # pick 2bcdef Add binary files and other edits pick xxxxxx Another change . . pick 3cdef3 Remove binary files; link to them as external resources . .
Insert
squash 3cdef3
as the second line and remove the line which sayspick 3cdef3
from the list. You now have a list of actions for the interactiverebase
which will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:$ git rebase --continue
This will take a minute or two.
You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind. If you want to remove them now:$ git reflog expire --expire=1.minute refs/heads/master #all deletions up to 1 minute ago available to be garbage-collected $ git fsck --unreachable # lists all the blobs(files) that will be garbage-collected $ git prune $ git gc
Now you've removed the bloat but kept the rest of your history.
这篇关于折叠一个git存储库的历史的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!