在服务器上对Git分支进行重复数据删除 [英] Deduplicate Git forks on a server

查看:156
本文介绍了在服务器上对Git分支进行重复数据删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以硬链接包含多个Git存储库的文件夹中的所有重复对象?

Is there a way to hard-link all the duplicate objects in a folder containing multiple Git repositories?

说明:

我正在公司服务器(Linux机器)上托管Git服务器. 这个想法是要有一个主规范存储库,每个用户都没有推送访问权限,但是每个用户都派生出规范存储库(将规范存储库克隆到用户的主目录,从而实际上创建了硬链接).

I am hosting a Git server on my company server (Linux machine). The idea is to have a main canonical repository, to which every user doesn't have push access to, but every user forks the canonical repository (clones the canonical to the user's home directory, thereby creating hard-links actually).

/规范/回购 /Dev1/Repo(最初克隆时与/canonical/Repo硬链接的对象) /Dev2/Repo(最初克隆时与/canonical/Repo硬链接的对象)

/canonical/Repo /Dev1/Repo (objects Hard-linked to /canonical/Repo to when initially cloned) /Dev2/Repo (objects Hard-linked to /canonical/Repo to when initially cloned)

一切正常.问题出现在以下情况:

This all works fine. The problem arises when:

Dev1:在服务器(/Dev1/Repo)Dev2上将一个巨大的提交推到他的fork上: 在他的本地系统上获取该数据,进行自己的更改并将其推送 到他自己的服务器(/Dev2/Repo)分支

Dev1: Pushes a huge commit onto his fork on server (/Dev1/Repo) Dev2: Fetches that on his local system, makes his own changes and pushes it to his own fork on server (/Dev2/Repo)

(现在,相同的巨大"文件位于服务器上开发人员的两个分支中.它不会自动创建硬链接.)

(Now the same 'huge' file resides in both the developer's forks on the server. It does not create a hard-link automatically.)

这真是疯狂地吞噬了我的服务器空间!

This is eating up my server space like crazy!

如何在两个分支之间重复的对象或在该问题上规范的对象之间创建硬链接,以便节省服务器空间,并且每个开发人员从其本地计算机上的他/她的分支中克隆时都能获得全部数据?

How can I create hard-links between the objects that are duplicate between the two forks or canonical for that matter, so that server space is saved and each developer when cloned from his/her fork on his/her local machine gets all the data?

推荐答案

我已决定这样做:

 shared-objects-database.git/
foo.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)
bar.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)
baz.git/
  objects/info/alternate (will have ../../shared-objects-database.git/objects)

所有分叉在其object/info/alternates文件中都会有一个条目,该条目提供了到对象数据库存储库的相对路径.

All the forks will have an entry in their objects/info/alternates file that gives a relative path to the objects' database repository.

使对象数据库成为存储库非常重要,因为我们可以保存具有相同名称的存储库的不同用户的对象和引用.

It is important to make the object database a repository, because we can save objects and refs of different users having a repository of the same name.

步骤:

  1. git init --bare shared-object-database.git
  2. 无论是每次推送到任何fork(通过post-receve)还是运行cronjob,我都运行以下代码行

  1. git init --bare shared-object-database.git
  2. I run the following lines of code either every time there is a push to any fork (via post-recieve) or by running a cronjob

for r in list-of-forks
    do

( cd"$ r"&& git push ../shared-objects-database.git"refs/:refs/remotes/$ r/"&& 回声../../shared-objects-database.git/objects> objects/info/alternates #为保存起见,我每次都添加"fat"对象来替代 ) 完成

( cd "$r" && git push ../shared-objects-database.git "refs/:refs/remotes/$r/" && echo ../../shared-objects-database.git/objects >objects/info/alternates # to be save I add the "fat" objects to alternates every time ) done

然后在下一个"git gc"中,将删除已经存在于fork中的所有对象.

Then in the next "git gc" all the objects in forks that already exist in alternate will be deleted.

git repack -adl也是一个选择!

这样,我们可以节省空间,以便两个用户在服务器上各自的fork上推送相同的数据将共享对象.

This way we save space so that two users pushing the same data on their respective forks on the server will share the objects.

我们需要在共享库中将gc.pruneExpire变量设置为never.为了安全起见!

We need to set the gc.pruneExpire variable up to never in the shared-object-database. Just to be safe!

要偶尔修剪对象,请将所有派生作为远程对象添加到共享,提取和修剪中! Git会做剩下的!

To occasionally prune objects, add all forks as remotes to the shared, fetch, and prune! Git will do the rest!

(我终于找到了一个对我有用的解决方案!(未经生产测试!:p感谢

(I finally found a solution that works for me! (Not tested in production! :p Thanks to this post.)

这篇关于在服务器上对Git分支进行重复数据删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆