GIT如何清除旧的已删除文件的历史记录包? [英] GIT How to clean history pack for old deleted files?

查看:1007
本文介绍了GIT如何清除旧的已删除文件的历史记录包?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用git svn clone导入了一个非常古老的SVN项目。
问题是我拿起了那个回购站的根文件夹,我已经把所有其他子项目导入到新的git仓库中,并且每个都从SVN中删除。
因此,当将根文件夹(包含最终的8个子文件夹)导入到一个单一的git仓库中时,全部仓库的所有历史记录也被导入(包括已删除的子项目历史记录)。 b

我执行了几个命令来清理包文件,但没有成功。它总是571Mb。
减少它的唯一命令是:

$ p $ g $ repack -a -d --depth = -window = 1000 -f

谷歌搜索,我发现很多帮助文件被删除,或删除大blob的历史,但不是已经消失的文件。



我创建了一个列表,其中包含我需要清除的所有已删除文件夹(仅限此列表中的顶级文件夹),使用此命令:

  git log --diff-filter = D --summary | grep delete |剪下-d-f5 |剪下-d/-f1 | grep -v\| sort | uniq> /tmp/tokill.txt 

然后,我做了这个(经过一些编辑,保留2个文件夹从历史删除):

  git filter-branch --index -filter'cat /tmp/tokill.txt | xargs git rm --cached --ignore-unmatch -r'

此时,日志有点重写,我不再能列出已删除的文件,但该包仍然是571Mb的大小,即使在重新打包,gc和/或修剪之后。



我缺少什么?
任何帮助都是合适的。



Best,
Lovato






于2014-08-05添加:



只需再澄清一点:我已经保存了各个子项目的历史记录,因为我已经将它们迁移到了git中,之后这些文件夹就从svn中删除了,所以我真的想摆脱历史,因为它不属于到这个范围,我明白它的奇怪,但我想知道如果我可以做到这一点。



我将一个巨大的SVN回购协议分成几个git回购协议,让每个人的生活更轻松。这个原始的SVN回购已经有6年了,而SVN提交的吨^ 2,所以我不能一个接一个地检查它是否会被删除。



关于大小,没有历史记录(其中包含大斑点的历史记录),它不到1Mb。它只是一堆java代码,文档和一些图片。



(可能)正确的方法是首先将所有这些根文件夹移动到名为last_project的文件夹,然后svn-git克隆这个last_project,并且所有属于/(意味着所有历史记录)的历史记录将保留在SVN上。






于2014-08-05新增:II: 部分解决方案在回顾我的问题时,Stackoverflow开始建议我之前没有找到的其他类似问题,因为它们只是一种相关的问题。其中之一就是BFG工具。
BFG工具不清除磁盘上不存在的文件的历史记录,但做了一个相当不错的工作,擦除了(某些时间)大于X kb的文件的所有历史记录。然后,我的总回购大小现在是20Mb,詹金斯(和所有人)可以从现在开始以2秒的速度下载它。



http://rtyley.github.io/bfg-repo-cleaner/

我仍然有我的原始回购的裸机,以应用任何可能的建议。

strong> 2014-08-06添加:



我必须完全清除旧的git repo,创建一个新的git repo,重写回购。它的工作现在。不是我想要的方式,而是工作。

解决方案

好像你想要过去存在但不再存在的物品部分版本库将从git中删除。



不幸的是,git不能像那样工作。因为这些项目是历史记录的一部分(也就是说,仍有分支/ refs /标签在它们的历史记录中引用这些提交),它们将继续存在,并且与这些提交相关的对象也会存在。

完全删除它们的唯一方法是将它们从您的git历史记录中删除。如果您有一个引用它们的分支,您可以删除该分支或重新绑定它,以便它不包含这些提交。无论哪种方式,git的垃圾回收都会启动并清除它们。



然而,你为什么要这样做呢? 571MB不是特别大,你会完全删除历史记录。



另一种方法是:


  1. 在其他地方创建一个空的存储库

  2. 在这个新的仓库中创建一个空的根提交( git commit --allow-empty -m'root commit' li>
  3. 添加git-svn仓库作为远程(它们没有任何共同之处)

  4. 添加一个新的本地分支来跟踪您想要的远程分支

  5. 将这个本地分支重新映射到您的新的空根提交中。
  6. 完成后,交互式rebase( rebase -i )多一次,并且 fixup 你不想要的提交(这实际上将它们全部合并成一个提交,其效果是所有被删除的文件都会被删除,但对存在的文件所做的任何更改都将在历史记录中持续存在)。
  7. 解决所有冲突。完成后,您将拥有一个新的纯粹的git存储库,仅包含您需要的历史记录。

  8. 删除远程计算机。

  9. 运行 git gc

您的新存储库现在应该小很多,并且您的原始git-svn存储库应该是未触及的。



有一个问题:你应该知道git-svn不会在你的原始svn仓库中兑现svn外部,所以你只能相信git-svn repo如果你的svn版本库不使用外部的。



更新

只要你保持相互依赖关系,分离出子项目就没有问题。例如:

 为了构建父项目版本45,您需要:
版本2的子项目A
版本10子项目B
...
子项目版本30 Z


I imported a very old SVN project, with git svn clone. The problem was that I picked up the root folder of that repo, where I already had imported all other sub-projects (into new git repos), and each one was deleted from SVN. So, when importing the root folder (with the final 8 subfolders) into one single git repo, all history for the full repo was also imported (included the deleted sub-projects history).

I did several commands to clean the pack file, with no success. It has always 571Mb. The only command that reduced it a bit was:

git repack -a -d --depth=500 --window=1000 -f

Googling, I found lots of helps for files being deleted, or deleting big blobs history, but not for already vanished files.

I created a list with all deleted folders I need to vanish (only top level folders on this list), with this command:

git log --diff-filter=D --summary | grep delete | cut -d" " -f5 | cut -d"/" -f1 | grep -v "\"" | sort | uniq > /tmp/tokill.txt

Then, I did this (after a little edit, to preserve 2 folders from history deletion):

git filter-branch --index-filter 'cat /tmp/tokill.txt | xargs git rm --cached --ignore-unmatch -r'

At this time, log was kind of rewriten. I no longer was able to list deleted files. But the pack was yet 571Mb size, even after repacks, gc and/or prune.

What am I missing? Any help is apreciated.

Best, Lovato


ADDED on 2014-08-05:

Just to clarify a bit more: I already preserved the individual sub-projects history because I already migrated them to git. After that, these folders were wiped out from svn. So, I really want to get rid of history, because it does not belong to this scope. I understand that its weird to git, but I would like to know if I can do it or not.

I splited one huge SVN repo into several git repos to make everyone's live easier. This original SVN repo has 6 years, and tons^2 of SVN commits, so I cannot dig one-by-one to check if it would be removed or not.

About size, w/o that history (which contains history for big blobs) it has less than 1Mb. Its just a bunch of java code, docs and a few images.

The (perhaps) correct way was to first move all those root-folders to a folder called "last_project", and then svn-git clone this "last_project", and all history belonging to "/" (which means ALL history) would remain on SVN.


ADDED on 2014-08-05 - II: partial solution

When reviewing my question, Stackoverflow started to suggest other similar question I did not find earlier, because they are only kind of related. One of those is about the BFG tool. BFG tool dows not clear "history for files that no longer exists on disk", but did a pretty good job erasing all history for files that were (somewhen) bigger than X kb. Then, my total repo size now is 20Mb, and Jenkins (and everyone) can download it in 2secs from now on.

http://rtyley.github.io/bfg-repo-cleaner/

I still have a bare copy of my original repo, to apply any solution that may be suggested.


ADDED on 2014-08-06:

I had to completelly wipe out my old git repo, create a new one, and them push the newly rewriten repo. Its working now. Not the way I wanted, but working.

解决方案

It seems like you want items that were present in the past but are no longer part of the repository to be deleted from git.

Unfortunately, git doesn't work like that. Because these items are part of the history (that is, there are still branches/refs/tags kicking around that refer to these commits in their history), they will stick around and so will objects related to those commits.

The only way to remove them completely would be to remove them from your git history. If you have a branch that refers to them, you could either delete that branch or rebase it so that it doesn't include those commits. Either way, git's garbage collection will kick in and get rid of them.

However, why do you want to do this? 571MB is not particularly large and you will be removing history completely.

Another way to do this is:

  1. Create an empty repository somewhere else
  2. Create an empty root commit in this new repository (git commit --allow-empty -m 'root commit')
  3. Add the git-svn repository as a remote (they will have nothing in common)
  4. Add a new local branch that tracks the remote branch you want
  5. Rebase this local branch onto your new empty root commit.
  6. When it's done, interactive rebase (rebase -i) one more time and fixup the commits you don't want (this will essentially combine all of them into one commit with the effect that all deleted files will get removed, but any changes to files that do exist will persist through history).
  7. Solve any conflicts. When that's done, you will have a new, pure git repository with only the history you need.
  8. Remove the remote.
  9. Run git gc

Your new repository should now be a lot smaller and your original git-svn repository should be untouched.

There is one gotcha: You should be aware that git-svn will not honor svn externals in your original svn repository and so you can only trust the git-svn repo if your svn repository does not use externals.

UPDATE

Separating out sub-projects is fine as long as you preserve the inter-dependencies. For example:

In order to build Parent project version 45, you need:
    version 2 of sub-project A
    version 10 of sub-project B
    ...
    version 30 of sub-project Z

这篇关于GIT如何清除旧的已删除文件的历史记录包?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆