对服务器上的 Git 分叉进行重复数据删除 [英] Deduplicate Git forks on a server

查看:38
本文介绍了对服务器上的 Git 分叉进行重复数据删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法硬链接包含多个 Git 存储库的文件夹中的所有重复对象?

Is there a way to hard-link all the duplicate objects in a folder containing multiple Git repositories?

解释:

我在公司服务器(Linux 机器)上托管了一个 Git 服务器.这个想法是有一个主要的规范存储库,每个用户都没有推送访问权限,但是每个用户都派生了规范存储库(将规范克隆到用户的主目录,从而实际上创建了硬链接).

I am hosting a Git server on my company server (Linux machine). The idea is to have a main canonical repository, to which every user doesn't have push access to, but every user forks the canonical repository (clones the canonical to the user's home directory, thereby creating hard-links actually).

/canonical/回购/Dev1/Repo(最初克隆时硬链接到/canonical/Repo 的对象)/Dev2/Repo(最初克隆时硬链接到/canonical/Repo 的对象)

/canonical/Repo /Dev1/Repo (objects Hard-linked to /canonical/Repo to when initially cloned) /Dev2/Repo (objects Hard-linked to /canonical/Repo to when initially cloned)

这一切都很好.问题出现在:

This all works fine. The problem arises when:

Dev1:将一个巨大的提交推送到他在服务器 (/Dev1/Repo) Dev2 上的 fork 上:在他的本地系统上获取它,进行自己的更改并推送它到他自己的服务器上的叉子(/Dev2/Repo)

Dev1: Pushes a huge commit onto his fork on server (/Dev1/Repo) Dev2: Fetches that on his local system, makes his own changes and pushes it to his own fork on server (/Dev2/Repo)

(现在相同的巨大"文件驻留在服务器上的两个开发人员的分支中.它不会自动创建硬链接.)

(Now the same 'huge' file resides in both the developer's forks on the server. It does not create a hard-link automatically.)

这正在疯狂地占用我的服务器空间!

This is eating up my server space like crazy!

如何在两个分支之间重复的对象或规范对象之间创建硬链接,以便节省服务器空间,并且每个开发人员在从他/她的本地计算机上的分支克隆时都能获得所有数据?

How can I create hard-links between the objects that are duplicate between the two forks or canonical for that matter, so that server space is saved and each developer when cloned from his/her fork on his/her local machine gets all the data?

推荐答案

我决定这样做:

 shared-objects-database.git/
foo.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)
bar.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)
baz.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)

所有分叉的对象/信息/替代文件中都有一个条目,该条目提供对象数据库存储库的相对路径.

All the forks will have an entry in their objects/info/alternates file that gives a relative path to the objects' database repository.

使对象数据库成为存储库很重要,因为我们可以保存具有相同名称存储库的不同用户的对象和引用.

It is important to make the object database a repository, because we can save objects and refs of different users having a repository of the same name.

步骤:

  1. git init --bare shared-object-database.git
  2. 每次推送到任何 fork(通过 post-recieve)或运行 cronjob 时,我都会运行以下代码行

  1. git init --bare shared-object-database.git
  2. I run the following lines of code either every time there is a push to any fork (via post-recieve) or by running a cronjob

for r in list-of-forks
    do

(cd "$r" &&git push ../shared-objects-database.git "refs/:refs/remotes/$r/" &&echo ../../shared-objects-database.git/objects >objects/info/alternates# 为了保存,我每次都将胖"对象添加到备用对象中)完成

( cd "$r" && git push ../shared-objects-database.git "refs/:refs/remotes/$r/" && echo ../../shared-objects-database.git/objects >objects/info/alternates # to be save I add the "fat" objects to alternates every time ) done

然后在下一个git gc"中,所有已经存在于alternate中的fork中的对象都将被删除.

Then in the next "git gc" all the objects in forks that already exist in alternate will be deleted.

git repack -adl 也是一种选择!

这样我们可以节省空间,以便两个用户在服务器上各自的分叉上推送相同的数据将共享对象.

This way we save space so that two users pushing the same data on their respective forks on the server will share the objects.

我们需要在共享对象数据库中将 gc.pruneExpire 变量设置为 never.为了安全起见!

We need to set the gc.pruneExpire variable up to never in the shared-object-database. Just to be safe!

要偶尔修剪对象,请将所有分叉作为远程对象添加到共享、获取和修剪!Git 将完成剩下的工作!

To occasionally prune objects, add all forks as remotes to the shared, fetch, and prune! Git will do the rest!

(我终于找到了适合我的解决方案!(未在生产中测试!:p 感谢 这篇文章.)

(I finally found a solution that works for me! (Not tested in production! :p Thanks to this post.)

这篇关于对服务器上的 Git 分叉进行重复数据删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆