git并行拉多个遥控器 [英] git pull multiple remotes in parallel
问题描述
我无法在联机帮助页面,谷歌或git-scm在线找到与此相关的任何内容。
要清楚的是:我不想在多个repos上运行一个命令,我有一个repo与成千上万的遥控器。
这与子模块无关,不要谈论子模块。子模块与git遥控器无关。
我很确定你必须编写自己的代码来做到这一点。 p>
由于 CodeWizard在一个评论,Git需要锁定存储库的部分。如果您在单个存储库中并行运行多个 git fetch
进程,那么这些锁中的某些有时会被冲突。
您可能还需要某种远程排序策略,例如从 remoteA
, remoteB
和 remoteC
并行可能会在 remoteB
上发现10000个常用对象,而如果 remoteB
通常(但不总是) remoteA
和 remoteC
的超集。 sup> 1 虽然这也适用于连续的 git fetch
操作,但它变得不那么重要。例如,假设有5个对象 - 一些你还没有的A,一些树,一些树,和一些Blob,C上有5000个对象,而在B上有10000个对象。如果以任何顺序顺序获取,你拿起5k,然后5k,然后0;或10k,然后0,则0;因为当您移动到下一个远程时,您已经收集并存储了5k或10k个传入对象。但是,如果你三个并行执行,那么你将会将5k,5k和10k的对象加入,只有然后发现你的工作量翻了一番。
1 如果B总是一个超集,首先(顺序)去B,然后转到A和C并行,仅供参考,这将指向您现在拥有的对象。
I have a repo with thousands of remotes, and I'd like to pull from thousands of remotes at the same time, ideally I can specify a maximum number to do at the same time.
I wasn't able to find anything related to this in the manpages, google, or git-scm online.
To be perfectly clear: I do not want to run one command over multiple repos, I have one repo with thousands of remotes.
This has nothing to do with submodules, don't talk about submodules. Submodules are unrelated to git remotes.
I'm pretty sure you have to write your own code to do this.
As CodeWizard says in a comment, Git needs to lock parts of the repository. Some of these locks are bound to collide at times, if you simply run multiple git fetch
processes in parallel within a single repository.
You might also want some kind of remote-ordering strategy since, e.g., collecting from remoteA
, remoteB
, and remoteC
in parallel may discover 10000 common objects on remoteB
as compared to the other two if remoteB
is generally (but not always) a superset of remoteA
and remoteC
.1 While this also applies to sequential git fetch
operations, it becomes considerably less important. Suppose, for example, that there are 5000 objects—some commits, some trees, and some blobs—on A that you do not yet have, 5000 others on C, and all 10000 on B. If you fetch sequentially, in any order, you pick up either 5k, then 5k, then 0; or 10k, then 0, then 0; because by the time you move to the next remote, you have collected and stored the 5k or 10k incoming objects. But if you do all three in parallel, you will bring 5k, 5k, and 10k objects in, and only then discover that you have doubled your workload.
1If B is always a superset, simply go to B first (sequentially), then go to A and C in parallel solely for their references, which will point to objects you now have.
这篇关于git并行拉多个遥控器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!