BFG回购清洁器-替代新鲜克隆 [英] BFG Repo Cleaner – Alternative to Fresh Clone

查看:109
本文介绍了BFG回购清洁器-替代新鲜克隆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算在存储库中询问这个问题,但是这样看来似乎是一个更合适的地方.

I was going to ask this on the repository but SO seemed like a more fitting place to ask this.

我能够使用BFG Repo Cleaner(很棒的工具,谢谢!)将我们的.git文件夹大小减少了1GB以上,就我们的存储库而言,这是一个巨大的成功.我尚未将裸露的克隆推到远程,因为我担心在了解推入然后不重新克隆的后果之前提出这些更改.

I was able to use BFG Repo Cleaner (great tool, thank you!) to reduce our .git folder size by over 1GB, which is a smashing success as far as our repository is concerned. I have not pushed my bare clone to remote yet, as I am concerned with putting forward these changes before understanding the consequences of pushing and then not re-cloning.

我了解最佳实践指示,当历史以这种方式改变时,最佳解决方案是执行新克隆.但是,我与一个由50多人组成的团队合作,在超过2GB的存储库中提交了23000次提交,在我们的组织结构下,跨团队的协调非常困难.结果,我有一些问题:

I understand that best practice dictates that when history has changed in this way, the best solution is to perform a fresh clone. However, I work with a team of over 50 people in a repository of over 2GB and 23k commits, and cross-team coordination can be incredibly difficult under our structure. As a result, I have some questions:

  1. 如果我推送这些更改的引用,而人们将继续使用其现有副本而不是创建一个新的克隆,后果将是什么?
  2. 如果可行,他们是否需要采取其他措施来减轻这些后果,作为其影响的一部分?
  3. 如果您认为删除的斑点来自至少一岁且至多三年的历史记录,此建议是否会根本改变?
  4. 最后,鉴于新克隆将不包含任何未在上游同步的工作,您是否建议将未跟踪分支从一个克隆转移到另一个克隆的最佳方法?如果已经存在执行此操作的Git命令,那么我很想听听您的见解.

再次感谢您创建了这样一个方便的工具,希望我能完成这一工作对我的团队的项目有用.同时,我将继续在叉子上进行试验.

Thanks again for creating such a handy tool, and hopefully I can finish making it useful for my team's project. I will continue to experiment on my fork in the meantime.

推荐答案

前言

在开始讨论之前,让我在开发人员活跃的团队的背景下阐明建议的清理Git历史记录的过程(无论使用什么技术进行清理-是否 BFG Repo-Cleaner git filter-branch):

Preface

Before we get into this, let me clarify the recommended process for cleaning Git history in the context of an active team of developers (no matter what technology used for the cleaning - whether BFG Repo-Cleaner or git filter-branch):

  1. 在存储库的本地一次性副本上进行几次清理的做法,因此您有信心可以做到这一点并获得理想的结果,并且知道需要多长时间.
  2. 与您的团队沟通.这是必不可少的,不可避免的(因为Git是专门为抱怨和改写历史而设置的),并且对任何团队来说都是良好实践 :-)您需要告诉他们:
    • 为什么要进行清理(例如较小的回购!)
    • 计划清洁时-给他们适当的提前警告.
    • 所有的工作推到主存储库 之前,开始清理-不需要将其合并到master分支,但是所有工作都需要在一个分支或另一个分支上被推上去.
    • 建议他们在清理完成后需要删除其回购的旧副本,并重新克隆新清理的存储库
  1. Practice doing the clean a few times on a local disposable copy of your repository, so you're confident that you can do it and get the desired result, and you know how long it takes.
  2. COMMUNICATE WITH YOUR TEAM. This is essential, unavoidable (because Git is specifically built to complain and get in the way if history is rewritten) and just good practice for any team :-) You need to tell them:
    • Why the clean is happening (eg smaller repo!)
    • When the clean is planned - give them suitable advance warning.
    • To push all of their work up to the main repo before the clean commences - it doesn't need to be merged to the master branch, but all work needs to be on a pushed up on one branch or another.
    • Advise them they'll need to delete their old copies of the repo when the clean is done, and re-clone the newly cleaned repository

因此,对您的问题:

如果我推这些经修改的裁判会带来什么后果 人们将继续使用现有副本,而不是创建一个 新鲜克隆?

What would the consequences be if I were to push these changed refs and people were to pull to their existing copy rather than create a fresh clone?

不好.根据经验,我可以说会一团糟,人们会感到困惑烦恼.

Bad. From experience I can say there will be a mess and people will get confused and upset.

具体来说,在该人的计算机上发生的事情是git pull命令会将旧的脏历史记录和新的清除历史记录合并在一起,并具有两个漫长的历史记录(最初与您的历史记录中的第一个脏"提交有所不同, (在您的情况下是3年前)与一个全新且非常混乱的合并提交一起加入了.用户很少清楚这种情况已经发生了-大多数Git日志可视化工具都不会以可能使其明显的方式呈现这种情况-如果您很幸运,用户可能会说类似之类的话"立即提交,WTF吗?!" -但前提是他们确实很观察.

Specifically, what happens on that person's machine is that the git pull command will merge together the old dirty history and the new cleaned history, with two long divergent histories (diverging initially with the first 'dirty' commit in your history, which in your case was 3 years ago) being joined together with one brand new and very confusing merge commit. It's seldom clear to users that this has happened - most Git log visualisers will not render this in a way likely to make it apparent - if you're lucky a user might say something like "I've got two copies of every commit now, WTF?!" - but only if they're really observant.

如果该用户以后进行了一些新的提交,并将其推回主存储库,则他们将把脏的历史记录回推到清理后的主存储库中,从而使您的工作无效,再次使您的历史记录变脏,并创建了一个非常令人困惑的Git历史记录,您所有其他用户在下次从主Git存储库中提取时都会暴露给他们.

If that user later makes some new commits, and pushes back up to the main repository, they will have pushed the dirty history back up to the cleaned main repository, negating your work, making your history dirty again, and creating a very confusing Git history which all your other users will become exposed to next time they pull from the main Git repo.

他们是否需要采取其他措施来减轻这些后果,因为 是可行的一部分,还是除了拉力之外,还可以吗?

Would they need to do anything else to mitigate these consequences as part of, or in addition to their pull, if this is feasible?

从技术上讲,是的.在实践中,该过程很复杂,容易出错,如果只有一个用户将其弄错,您将像以前一样陷入困境.

Technically, yes. In practice, the procedure is complex, error-prone, and if just one user gets it wrong, you are screwed just like before.

在这一点上,我们必须弄清楚为什么您要躲避此过程.是因为:

At this point, we have to work out why you're trying to dodge this procedure. Is it because:

  • 您正试图使用​​户不必了解&应对变化的Git历史记录?听起来这可能是您根据您的口语所达成的目标在我们的结构下,跨团队协作非常困难" -但很遗憾,这不是可以实现的目标,因为Git不会在用户未注意到的情况下让您更改历史记录.用户将不得不做某些事情,他们将需要与您进行协调.
  • 您希望减少对真正的大型存储库进行全新克隆的下载时间,希望Git仅下载更改后的Blob,而不下载所有没有的东西更改?对于需要数小时才能下载的巨大的数千兆字节存储库,这是一个稍微合理的目标(如果您使用BFG来使存储库更小,则动机就更少了)-不幸的是,由于您不会的Git协议就能实现这些好处. Git协议旨在建立远程服务器上不在本地存储库中的提交,并发送定制的packfile,其中仅包含使本地存储库最新所需的内容.很好,但是请注意,比较单位是 commits .重写历史记录时,提交的文件树几乎没有变化-但是提交IDs all 发生了变化,因为提交ID是其 parental的哈希历史记录,以及文件树的内容. Git协议仅比较提交ID,并且它们都是不同的-因此,所有 all 提交都将与它们的文件树对象一起发送.该协议没有深入研究,以至于它不需要发送大多数这些文件树对象-因此,您没有从已经在本地存储库中拥有它们的副本的好处中获益.
  • You're trying to save users from having to know about & deal with the change Git history? It sounds like this might be your goal based on your saying "cross-team coordination can be incredibly difficult under our structure" - but unfortunately this is not an attainable goal, because Git will not let you change history without users noticing. Users will have to do something, and they will need to coordinate with you.
  • You want to reduce the download time of doing a fresh clone of your really massive repository, hoping that Git will only downloaded the changed blobs, and not all the stuff that didn't change? This is a slightly more reasonable goal for gigantic multi-gigabyte repos that take hours to download (tho' if you use the BFG to make the repo much smaller, there's less motivation)- unfortunately, due to details of the Git protocol you won't be able to realise those benefits. The Git protocol is designed to establish what commits are on the remote server that aren't in your local repo, and send a tailored packfile containing only what you need to bring your local repo up to date. This is great, but notice that the unit of comparison is commits. When you rewrite history, the file tree of the commits change hardly at all - but the commit ids all change, because the commit id is a hash of it's parental history, as well it's file tree content. The Git protocol is only comparing commit ids, and they are all different - so all the commits will get sent, along with their file-tree objects. The protocol doesn't dig deep enough to realise that it doesn't need to send most of those file-tree object - and so you don't get the benefit of already having copies of them in your local repo.

如果您认为斑点,此建议是否会完全更改 删除的内容来自至少一年以上的历史记录 三岁?

Does this recommendation change at all if you consider that the blobs that were deleted are from history that is at least a year old and at most three years old?

如果坏东西是最近才提交的,并且还没有其他用户将其删除(因此,在过去几个小时或几分钟内),您可能可以在其他人将其删除之前快速清除主存储库上的历史记录.只要其他人提取脏数据,就需要对其进行净化处理,最简单的方法是删除并重新克隆.

If the bad stuff has been committed very recently, and no other users have pulled it yet (so, within the last few hours or minutes) you could possibly get away with quickly cleaning history on the main repo before anyone else pulls it. As soon as anyone else pulls dirty data, they need to be decontaminated, and the easiest way to do that is delete and re-clone.

如果坏东西是在多年前犯下的,那么每个人都有它,而所有他们都需要去污染.

If the bad stuff was committed years ago, then everyone has it, and they all need to be decontaminated.

最后,鉴于新克隆将不包括任何未同步的工作 上游,您对最好的结转方法有何建议 从一个克隆到另一个克隆的未跟踪分支?

Finally, given that a new clone would not include any work not synced upstream, do you have a recommendation on the best way to carry over untracked branches from one clone to another?

处理此问题的推荐方法是确保它不会发生.与您的团队沟通,告诉他们即将进行存储库清理,并且要使其正常工作,他们要做的就是确保在开始清理之前,他们已将所有分支上的所有工作推到主存储库中

The recommended way to deal with this problem is to make sure it does not happen. Communicate with your team, tell them that the repository cleaning is going to take place, and all they have to do to make it work is make sure they've pushed all their work up on any branch to the main repository before you start the cleaning.

如果某人不这样做,他们可以尝试将他们关心的分支重新部署到已清除的历史记录上.对于每个feature分支,类似:

If someone doesn't do this, they can try rebasing the branches they care about onto the cleaned history. For each feature branch, something like:

$ git rebase --onto clean-origin/feature unclean-origin/feature feature

$ git rebase --onto clean-origin/feature unclean-origin/feature feature

...(这大致翻译为获取功能分支上的所有提交,我没有将其推送到主存储库中,然后将其清理干净,然后在顶部重播它们"该分支的主存储库的清除版本).

...(which loosely translates to "take all the commits that are on my feature branch, that I didn't push to the main repo before it was cleaned, and replay them on top of the main repo's cleaned version of that branch).

如果用户弄错了这个错误,或者忘记了只为一个分支执行此操作,您将回到糟糕的混合脏/清除历史记录场景.

If the user gets this wrong, or forgets to do it for just one branch, you will be back to the bad mixed dirty/clean history scenario.

您了解您的团队,您确定他们都可以完美地执行深奥的Git重新部署操作吗?如果这样做的话,有什么好处呢?说到底并完成了,告诉他们删除旧的仓库并重新克隆,难道不是很容易吗?

You know your team, are you sure they can all perform esoteric Git rebasing operations flawlessly? And what is the benefit if they do? After all is said and done, isn't it easier just to tell them to delete their old repo and re-clone?

这篇关于BFG回购清洁器-替代新鲜克隆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆