Git:共享本地存储(使用硬链接) [英] Git: cloning with shared local storage (using hard links)

查看:462
本文介绍了Git:共享本地存储(使用硬链接)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想让大量的开发者很容易反复克隆一个非常大的远程git仓库。某种本地每用户缓存是必要的。显然有很多方法可以做到这一点,我只是感到惊讶,似乎在我看来最自然的一种方式在git中不存在。



有没有这方面的行业标准做法?

有一些git方案,我只是误解?



理想的解决方案

  #first clone  - 非常慢。 
git克隆ssh://remote.repo/repo.git repo1
#subsequent克隆 - 快如闪电
git克隆 - 共享与硬链接repo1 ssh://远程。 repo / repo.git repo2

在这个想象的解决方案中,没有。 git / objects / info / alternates 创建,对象只是在使用硬链接的克隆上共享,比如rsync的 - link-dest 选项或类似git的克隆当回购是在本地文件系统。



我看到的替代方案,都没有那么吸引人:


  1. 我可以使用 git clone --reference repo1 ssh://remote.repo/repo.git repo2 ,它依赖于repo1存在且如果repo1被删除,那么repo2将被放弃。

  2. 我可以做 git clone --dissociate --reference repo1 ssh://remote.repo/repo.git repo2 但存储不共享,所以现在我使用了两倍的存储空间,而且可能还是比较慢。

  3. 哈复杂程度不尽相同,可能需要围绕克隆和拉动进行包装。与真正的编程相比,复杂性显然微不足道,但是在一堆包装下运行你的SCM只是一个应该避免的麻烦。

    • 将git'cache'回购存储在每台开发人员个人电脑的中央位置,并有一个克隆包装以便首先自动获取缓存,然后克隆 - 参考<缓存>

    • 记住每个完成的克隆,随后的克隆将查找预先存在的本地克隆并从中本地克隆(创建硬链接),然后修复遥控器。粗略地说,它是这样的:


 #找到所有现有的克隆... repo1 
git clone / path / to / repo1 repo2
git remote rm origin
git remote add origin ssh://remote.repo/repo.git
git fetch
#Abandon在其他工作区中进行的任何本地修改
for ref in $(git --git -dir$ gitdirfor-each-ref refs / heads --format%(refname));做
refbase = $(basename $ ref)
run_cmd git --git-dir$ gitdirupdate-ref $ ref remotes / origin $ refbase
done

但是这一切似乎都像是黑客。肯定有更好的方法吗?



谢谢,

Mort



p>


  • 实际上我们有一个LAN本地镜像。回购足够大,我们需要更好地实现合理的克隆速度。

  • 回购很大。 11分钟即可在GigE上克隆,如果用户使用Windows,则可以多达40分钟。 b / p>

    我能想到的最好的办法是在 /var/cache/git/<repo_name>.git 即中央仓库的 clone --mirror 。新克隆使用 - 共享选项来减少初始克隆中的空间/时间,并加速后续的 fetch ES。有一个包装脚本 clone 这是一个新的工作区:

      git --git-dir /var/cache/git/<repo_name> git远程更新
    git clone --shared /var/cache/git/<repo_name> ;.git
    git remote set-url origin ssh://remote.repo/repo.git

    我宁愿选择那些依赖于硬链接,因为如果对象以某种方式从共享缓存中删除,它们不受问题的困扰。但我猜这是不存在的。

    解决方案

    Git在克隆本地存储库时默认为hardlink。因此,您可以将

      git clone / path / to / repo / path / to / clone 
    cd / path / to / clone
    git remote add upstream http://example.com/path/to/repo/to/clone
    git fetch upstream

    但是这有一些缺点:


    • 下一个 git gc 会破坏硬链接并占用你的磁盘空间。

    • 只有在 / path / to / repo / path / to / clone 在同一个分区上。

    • 您必须小心使用结果上的工具,例如没有 -H rsync 会复制所有的硬链接。



      我认为在大多数情况下 .git / objects / info / alternates 会更好。

    • $ b

      I'd like to make it easy for a large number of devs to repeatedly clone a very large and remote git repo. Some sort of local per-user 'caching' is necessary. There are obviously lots of ways to do this, I'm just surprised that it seems as if the one way that would seem most natural to me does not exist in git.

      Is there an industry standard practice on this?
      Is there some git option that I'm just misunderstanding?

      Ideal solution

      #first clone - very slow.
      git clone ssh://remote.repo/repo.git repo1
      #subsequent clones - lightning fast
      git clone --shared-with-hard-links repo1 ssh://remote.repo/repo.git repo2
      

      In this imaginary solution, there is no .git/objects/info/alternates created, objects are just shared on clone using hard links, like rsync's --link-dest option, or like git's clone when the repo is on the local filesystem.

      The alternatives I see, are none of them that attractive:

      1. I can do git clone --reference repo1 ssh://remote.repo/repo.git repo2 which relies on repo1 existing and if repo1 is deleted, then repo2 is fubared.
      2. I can do git clone --dissociate --reference repo1 ssh://remote.repo/repo.git repo2 but storage is not shared so now I've used up twice the storage I want, and it's probably still relatively slow for that reason.
      3. There are various hacks of varying complexity that may need wrappers around cloning and pulling. The complexity is, compared to real programming, obviously trivial, but running your SCM under a bunch of wrappers is just a nuisance that should really be avoided.
        • Store a git 'cache' repo in a central location on each dev's PC and have a wrapper around clone to automatically fetch on the cache first and then clone --reference <cache>.
        • Remember every clone that is done and subsequent clones will look for a pre-existing local clone and clone locally from that (creating hard links) and then fix up the remotes after that. Roughly, it goes something like this:

      .

      #find any existing clones... repo1
      git clone /path/to/repo1 repo2
      git remote rm origin
      git remote add origin ssh://remote.repo/repo.git
      git fetch
      #Abandon any local changes made in the other workspace
      for ref in $(git --git-dir "$gitdir" for-each-ref  refs/heads --format "%(refname)" ) ; do
          refbase=$(basename $ref)
          run_cmd git --git-dir "$gitdir" update-ref $ref remotes/origin $refbase
      done
      

      But it all seems like a hack. Surely there's a better way?

      Thanks,
      Mort

      Notes:

      • We actually do have a LAN-local mirror. The repo is large enough, that we need better than just that to achieve reasonable clone speeds.
      • The repo is big. 11 min to clone over GigE and up to 40 min if the user is on Windows.

      Update

      The best thing that I can figure out to do is to have a cache in /var/cache/git/<repo_name>.git that is a clone --mirror of the central repo. New clones use the --shared option to both reduce space/time in the initial clone and to speed up subsequent fetches. There is a wrapper script to clone a new workspace that does this:

      git --git-dir /var/cache/git/<repo_name>.git remote update
      git clone --shared /var/cache/git/<repo_name>.git
      git remote set-url origin ssh://remote.repo/repo.git
      

      I would have preferred something that relied on hard links because they are immune to issues if objects are somehow removed from the shared cache. But I guess that does not exist.

      解决方案

      Git does hardlink by default when you clone a local repository. So, you can

      git clone /path/to/repo /path/to/clone
      cd /path/to/clone
      git remote add upstream http://example.com/path/to/repo/to/clone
      git fetch upstream
      

      But this has a number of disadvantages:

      • The next git gc will break hardlinks and eat your disk space.
      • This will work only if /path/to/repo and /path/to/clone are on the same partition.
      • You have to be careful with the tools you use on the result, e.g. a rsync without -H will copy all hardlinks.

        I think the .git/objects/info/alternates is much better in most cases.

      这篇关于Git:共享本地存储(使用硬链接)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆