具有仅当前追踪文件的历史复制的新回购 [英] New repo with copied history of only currently tracked files

查看:95
本文介绍了具有仅当前追踪文件的历史复制的新回购的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们目前的回购有成千上万的提交,并且新的克隆传输了近一个数据(有很多jar文件在历史中被删除)。我们希望通过制作一个新的repo来保留完整的历史记录,只保存回购中当前活动的文件,或者修改当前的repo以清除已删除的文件历史记录。但我不确定如何在实际庄园中做到这一点。



我试过了从git历史记录中删除已删除的文件

  for del in cat deleted.txt` 

git filter-branch --index-filtergit rm --cached --ignore -unmatch $ del--prune-empty - --all
#下面这些似乎是必要的time
#因为否则git不会覆盖refs / original
git reset --hard
git for-each-ref --format =%(refname)refs / original / | xargs -n 1 git update-ref -d
git reflog expire --expire = now --all
git gc --aggressive --prune = now
done;

但是,鉴于我们有成千上万的历史记录中已删除的文件和数万次提交,运行脚本将需要一个永恒。我开始在2小时前运行这个只有一个被删除的文件,并且filter-branch命令仍在运行,它一次只能处理40,000个提交中的每个提交,并且这是一个新的带有SSD驱动器的Macbook Pro。



我也读过这个页面 https://help.github.com/articles/remove-sensitive-data ,但这只适用于删除单个文件。

有没有人能够做到这一点?我真的想保留当前跟踪文件的历史记录,我不确定如果我们无法保留历史记录,节省空间的好处是否值得创建新的回购。

解决方案

删除所有内容并恢复您想要的内容



这个文件列表一次一个,

  $ git checkout master 
$ git ls-files> keep-these.txt
$ git filter-branch --force --index-filter \
git rm --ignore-unmatch --cached -qr。; \
cat $ PWD / keep-these.txt | xargs git reset -q \ $ GIT_COMMIT - \
--prune-empty --tag-name-filter cat - --all

执行速度可能会更快。



清理步骤< h3>

整个过程完成后,然后清理:

  $ rm -rf .git / refs / original / 
$ git reflog expire --expire = now --all
$ git gc --prune = now

#可选的额外gc。缓慢,可能不会进一步减少回购规模
$ git gc --aggressive --prune = now

比较前后的存储库大小,应该表明大小
减少,当然只有提交触及保存的文件,并且合并
提交 - 即使为空(
因为这就是--prune-empty作品),将 $ b

$ GIT_COMMIT?



使用 $ GIT_COMMIT 似乎引起了一些混淆, from git filter-branch documentation (加入强调):

lockquote

这个参数总是在shell上下文中使用eval命令(由于技术原因,提交过滤器有明显的例外)。在此之前, $ GIT_COMMIT环境变量将被设置为包含被重写的提交的ID



<这意味着 git filter-branch 将在运行时提供该变量,它并非由您提供。这可以证明,如果有任何疑问使用这个no-op过滤器分支命令:

$ $ p $ code $ git filter-branch --index-过滤器 回声当前提交被\ $ GIT_COMMIT
重写d832800a85be9ef4ee6fda2fe4b3b6715c8bb860(1 / XXXXX)当前提交是d832800a85be9ef4ee6fda2fe4b3b6715c8bb860
重写cd86555549ac17aeaa28abecaf450b49ce5ae663(2 / XXXXX)当前提交是cd86555549ac17aeaa28abecaf450b49ce5ae663
...


Our current repo has tens of thousands of commits and a fresh clone transfers nearly a gig of data (there are lots of jar files that have since been deleted in the history). We'd like to cut this size down by making a new repo that keeps the full history for just the files that are currently active in the repo, or possibly just modify the current repo to clear the deleted file history. But I'm not sure how to do this in a practical manor.

I've tried the script in Remove deleted files from git history:

for del in `cat deleted.txt`
do
    git filter-branch --index-filter "git rm --cached --ignore-unmatch $del" --prune-empty -- --all
    # The following seems to be necessary every time
    # because otherwise git won't overwrite refs/original
    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
done;

But given that we have tens of thousands of deleted files in the history and tens of thousands of commits, running the script would take an eternity. I started running this for just ONE deleted file 2 hours ago and the filter-branch command is still running, it's going through each of the 40,000+ commits one at a time, and this is on a new Macbook pro with an SSD drive.

I've also read the page https://help.github.com/articles/remove-sensitive-data but this only works for removing single files.

Has anyone been able to do this? I really want to preserve history of currently tracked files, I'm not sure if the space savings benefit would be worth creating a new repo if we can't keep the history.

解决方案

Delete everything and restore what you want

Rather than delete this-list-of-files one at a time, do the almost-opposite, delete everything and just restore the files you want to keep:

$ git checkout master
$ git ls-files > keep-these.txt
$ git filter-branch --force --index-filter \
  "git rm  --ignore-unmatch --cached -qr . ; \
  cat $PWD/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" \
  --prune-empty --tag-name-filter cat -- --all

It may be faster to execute.

Cleanup steps

Once the whole process has finished, then cleanup:

$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now

# optional extra gc. Slow and may not further-reduce the repo size
$ git gc --aggressive --prune=now

Comparing the repository size before and after, should indicate a sizable reduction, and of course only commits that touch the kept files, plus merge commits - even if empty (because that's how --prune-empty works), will be in the history.

$GIT_COMMIT?

The use of $GIT_COMMIT seems to have caused some confusion, from the git filter-branch documentation (emphasis added):

The argument is always evaluated in the shell context using the eval command (with the notable exception of the commit filter, for technical reasons). Prior to that, the $GIT_COMMIT environment variable will be set to contain the id of the commit being rewritten.

That means git filter-branch will provide the variable at run time, it's not provided by you before hand. This can be demonstrated if there's any doubt using this no-op filter branch command:

$ git filter-branch --index-filter "echo current commit is \$GIT_COMMIT"
Rewrite d832800a85be9ef4ee6fda2fe4b3b6715c8bb860 (1/xxxxx)current commit is d832800a85be9ef4ee6fda2fe4b3b6715c8bb860
Rewrite cd86555549ac17aeaa28abecaf450b49ce5ae663 (2/xxxxx)current commit is cd86555549ac17aeaa28abecaf450b49ce5ae663
...

这篇关于具有仅当前追踪文件的历史复制的新回购的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆