如何使用 git clone --recursive 加速/并行化 git 子模块的下载? [英] How to speed up / parallelize downloads of git submodules using git clone --recursive?
问题描述
克隆具有很多子模块的 git 存储库需要很长时间.在下面的例子中是 ~100 个子模块
Cloning git repositories that have a lot submodules takes a really long time. In the following example are ~100 submodules
git clone --recursive https://github.com/Whonix/Whonix
Git 将它们一一克隆.花费的时间比所需的要长得多.让我们做一个(可能的)假设,即客户端和服务器都有足够的资源来同时响应多个(并行)请求.
Git clones them one by one. Takes much longer than required. Let's make the (probable) assumption that both the client and the server has sufficient resources to answer multiple (parallel) requests at the same time.
如何使用 git clone --recursive
加速/并行化 git 子模块的下载?
How to speed up / parallelize downloads of git submodules using git clone --recursive
?
推荐答案
当我运行您的命令时,下载 68 Mb 需要 338 秒的挂墙时间.
When I run your command it takes 338 seconds wall-time for downloading the 68 Mb.
安装下面依赖于GNU并行的Python程序,
With the following Python program that relies on GNU parallel to be installed,
#! /usr/bin/env python
# coding: utf-8
from __future__ import print_function
import os
import subprocess
jobs=16
modules_file = '.gitmodules'
packages = []
if not os.path.exists('Whonix/' + modules_file):
subprocess.call(['git', 'clone', 'https://github.com/Whonix/Whonix'])
os.chdir('Whonix')
# get list of packages from .gitmodules file
with open(modules_file) as ifp:
for line in ifp:
if not line.startswith('[submodule '):
continue
package = line.split(' "', 1)[1].split('"', 1)[0]
#print(package)
packages.append(package)
def doit():
p = subprocess.Popen(['parallel', '-N1', '-j{0}'.format(jobs),
'git', 'submodule', 'update', '--init',
':::'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
res = p.communicate('
'.join(packages))
print(res[0])
if res[1]:
print("error", res[1])
print('git exit value', p.returncode)
return p.returncode
# sometimes one of the updates interferes with the others and generate lock
# errors, so we retry
for x in range(10):
if doit() == 0:
print('zero exit from git after {0} times'.format(x+1))
break
else:
print('could not get a non-zero exit from git after {0} times'.format(
x+1))
那个时间减少到 45 秒(在同一系统上,我没有多次运行以平均波动).
that time is reduced to 45 seconds (on the same system, I did not do multiple runs to average out fluctuations).
为了检查一切是否正常,我比较"了检出的文件:
To check if things were OK, I "compared" the checked out files with:
find Whonix -name ".git" -prune -o -type f -print0 | xargs -0 md5sum > /tmp/md5.sum
在一个目录和
md5sum -c /tmp/md5sum
在另一个目录中,反之亦然.
in the other directory and vice versa.
这篇关于如何使用 git clone --recursive 加速/并行化 git 子模块的下载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!