如何使用 git clone --recursive 加速/并行化 git 子模块的下载? [英] How to speed up / parallelize downloads of git submodules using git clone --recursive?

查看:72
本文介绍了如何使用 git clone --recursive 加速/并行化 git 子模块的下载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

克隆具有很多子模块的 git 存储库需要很长时间.在下面的例子中是 ~100 个子模块

Cloning git repositories that have a lot submodules takes a really long time. In the following example are ~100 submodules

git clone --recursive https://github.com/Whonix/Whonix

Git 将它们一一克隆.花费的时间比所需的要长得多.让我们做一个(可能的)假设,即客户端和服务器都有足够的资源来同时响应多个(并行)请求.

Git clones them one by one. Takes much longer than required. Let's make the (probable) assumption that both the client and the server has sufficient resources to answer multiple (parallel) requests at the same time.

如何使用 git clone --recursive 加速/并行化 git 子模块的下载?

How to speed up / parallelize downloads of git submodules using git clone --recursive?

推荐答案

当我运行您的命令时,下载 68 Mb 需要 338 秒的挂墙时间.

When I run your command it takes 338 seconds wall-time for downloading the 68 Mb.

安装下面依赖于GNU并行的Python程序,

With the following Python program that relies on GNU parallel to be installed,

#! /usr/bin/env python
# coding: utf-8

from __future__ import print_function

import os
import subprocess

jobs=16

modules_file = '.gitmodules'

packages = []

if not os.path.exists('Whonix/' + modules_file):
    subprocess.call(['git', 'clone', 'https://github.com/Whonix/Whonix'])

os.chdir('Whonix')

# get list of packages from .gitmodules file
with open(modules_file) as ifp:
    for line in ifp:
        if not line.startswith('[submodule '):
            continue
        package = line.split(' "', 1)[1].split('"', 1)[0]
        #print(package)
        packages.append(package)

def doit():
    p = subprocess.Popen(['parallel', '-N1', '-j{0}'.format(jobs),
                          'git', 'submodule', 'update', '--init',
                          ':::'],
                         stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    res = p.communicate('
'.join(packages))
    print(res[0])
    if res[1]:
        print("error", res[1])
    print('git exit value', p.returncode)
    return p.returncode

# sometimes one of the updates interferes with the others and generate lock
# errors, so we retry
for x in range(10):
    if doit() == 0:
        print('zero exit from git after {0} times'.format(x+1))
        break
else:
    print('could not get a non-zero exit from git after {0} times'.format(
          x+1))

那个时间减少到 45 秒(在同一系统上,我没有多次运行以平均波动).

that time is reduced to 45 seconds (on the same system, I did not do multiple runs to average out fluctuations).

为了检查一切是否正常,我比较"了检出的文件:

To check if things were OK, I "compared" the checked out files with:

find Whonix -name ".git" -prune -o -type f -print0 | xargs -0 md5sum > /tmp/md5.sum

在一个目录和

md5sum -c /tmp/md5sum 

在另一个目录中,反之亦然.

in the other directory and vice versa.

这篇关于如何使用 git clone --recursive 加速/并行化 git 子模块的下载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆