折叠一个git存储库的历史 [英] Collapsing a git repository's history

查看:77
本文介绍了折叠一个git存储库的历史的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



具体而言,在项目的早期阶段,项目中有相当多的二进制资源文件,这些文件现在已被删除,因为它们是有效的外部资源。



但是,由于拥有这些资源,我们的资源库的大小大于200MB(总结算量目前约为20MB)先前已提交的文件。

我们想要做的是折叠历史记录,以便版本库看起来是从以后的版本创建的。例如

  1 ----- 2 ----- 3 ----- 4 ----- + --- + --- + 
\ /
+ ----- + --- + --- +


  1. 创建存储库

  2. 添加大量二进制文件
  3. 删除了大量的二进制文件

  4. 库的新开始'启动'


  5. 我们希望在某个点之前失去项目历史。此时只有一个分支,所以不需要处理多个起始点等问题。但是,我们不想丢失所有历史记录,并使用当前版本启动一个新的存储库。



    这是可能的,还是我们注定会永远存在一个臃肿的存储库?

    解决方案

    您可以删除二元膨胀并保留其余的历史记录。 Git允许你在提交之前重新排序和挤压,所以你可以只提交添加和删除你的大二进制文件的提交。如果这些添加全部在一个提交中完成,并且在另一个提交中完成,那么这将比处理每个文件要容易得多。

      $ git log --stat#列出所有提交和提交消息

    搜索此提交添加和删除你的二进制文件并记下他们的SHA1,比如 2bcdef 3cdef3



    然后编辑回购的历史记录,使用交互选项使用 rebase -i 命令,从添加二进制文件的提交的父节点开始。它将启动您的$ EDITOR,您将看到以 2bcdef 开头的提交列表:

      $ git rebase -i 2bcdef ^#生成以2bcdef开头的所有提交选择列表
    #将zzzzzz重新定位到yyyyyyy

    #命令:
    # pick =使用提交
    #edit =使用提交,但停止修改
    #squash =使用提交,但融入前一提交

    #如果你在这里删除一行THAT提交将失败。

    选择2bcdef添加二进制文件和其他编辑
    选择xxxxxx另一个更改


    选择3cdef3删除二进制文件;链接到它们作为外部资源


    在第二行插入 squash 3cdef3 从列表中删除 pick 3cdef3 行。您现在有一个交互式 rebase 的操作列表,它将提交内容添加和删除您的二进制文件合并为一个提交,其差异就是这些提交中的任何其他更改。然后,当你告诉它完成时,它会重新申请所有后续提交:

      $ git rebase --continue 

    这需要一两分钟时间。

    您现在有一个回购更长的二进制文件来了或去。但是他们仍然会占用空间,因为默认情况下,Git会保留30天左右的变化,然后才能进行垃圾回收,这样您就可以改变主意。
    如果你想现在删除它们:

      $ git reflog expire --expire = 1.minute refs / heads / master 
    1分钟前的所有删除操作都可以被垃圾收集
    $ git fsck --unreachable#列出所有将被垃圾收集的blob(文件)
    $ git prune
    $ git gc

    现在您已经移除了膨胀但保留了其余部分历史。

    We have a git project which has quite a big history.

    Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they're effectively external resources.

    However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.

    What we'd like to do is "collapse" the history so that the repository appears to have been created from a later revision than it was. For example

    1-----2-----3-----4-----+---+---+
                       \       /
                        +-----+---+---+
    

    1. Repository created
    2. Large set of binary files added
    3. Large set of binary files removed
    4. New intended 'start' of repository

    So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there's no complication with trying to deal with multiple start points etc. However we don't want to lose all of the history and start a new repository with the current version.

    Is this possible, or are we doomed to have a bloated repository forever?

    解决方案

    You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and 'squash' prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.

    $ git log --stat       # list all commits and commit messages 
    

    Search this for the commits that add and delete your binary files and note their SHA1s, say 2bcdef and 3cdef3.

    Then to edit the repo's history, use rebase -i command with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you'll see a list of commits starting with 2bcdef:

    $ git rebase -i 2bcdef^    # generate a pick list of all commits starting with 2bcdef
    # Rebasing zzzzzz onto yyyyyyy 
    # 
    # Commands: 
    #  pick = use commit 
    #  edit = use commit, but stop for amending 
    #  squash = use commit, but meld into previous commit 
    # 
    # If you remove a line here THAT COMMIT WILL BE LOST.
    #
    pick 2bcdef   Add binary files and other edits
    pick xxxxxx   Another change
      .
      .
    pick 3cdef3   Remove binary files; link to them as external resources
      .
      .
    

    Insert squash 3cdef3 as the second line and remove the line which says pick 3cdef3 from the list. You now have a list of actions for the interactive rebase which will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:

    $ git rebase --continue
    

    This will take a minute or two.
    You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind. If you want to remove them now:

    $ git reflog expire --expire=1.minute refs/heads/master
          #all deletions up to 1 minute  ago available to be garbage-collected
    $ git fsck --unreachable      # lists all the blobs(files) that will be garbage-collected
    $ git prune
    $ git gc                      
    

    Now you've removed the bloat but kept the rest of your history.

    这篇关于折叠一个git存储库的历史的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆