浅而稀疏的GIT存储库克隆 [英] Shallow AND Sparse GIT Repository Clone

查看:115
本文介绍了浅而稀疏的GIT存储库克隆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个浅克隆的git存储库,容量超过1 GB.我对所需的文件/目录使用了稀疏签出.

I have a shallow cloned git repository that is over 1 GB. I use sparse checkout for the files/dirs needed.

如何将存储库克隆减少为稀疏签出文件/目录?

How can I reduce the repository clone to just the sparse checkout files/dirs?

最初,我可以通过在克隆时禁用签出功能来将克隆的存储库限制为仅稀疏签出.然后在进行初始签出之前设置稀疏签出.这将存储库限制为仅约200 MB.更易于管理.但是,将来在某个时候更新远程分支信息会导致其余文件和目录包含在存储库克隆中.将回购克隆大小发送回1 GB以上,我不知道如何仅处理稀疏的结帐文件和目录.

Initially I was able to limit the cloned repository to only the sparse checkout by disabling checkout when cloning. Then setting up sparse checkout before doing the initial checkout. This limited the repository to only about 200 MB. Much more manageable. However updating remote branch info at some point in the future causes the rest of the files and dirs to be included in the repository clone. Sending the repo clone size back to over 1 GB and I don't know how to just the sparse checkout files and dirs.

简而言之,我想要的是一个浅的 AND 稀疏存储库 clone .不只是稀疏地检查浅层回购克隆.完整的回购协议会浪费空间,并且某些任务的性能会受到影响.

In short what I want is a shallow AND sparse repository clone. Not just sparse checkout of a shallow repo clone. The full repo is a waste of space and performance for certain tasks suffers.

希望有人可以分享解决方案.谢谢.

Hope someone can share a solution. Thanks.

推荐答案

浅而稀疏的意思是部分"或狭窄".

Shallow and sparse means "partial" or "narrow".

从理论上讲,部分克隆(或窄克隆")是可能的,并于2017年12月首次使用Git 2.16进行了实现,如在这里看到.
但是:

A partial clone (or "narrow clone") is in theory possible, and was implemented first in Dec 2017 with Git 2.16, as seen here.
But:

  • only with Git 2.18 could you do such a partial clone: see here for a test example.
  • only with a server supporting a transport protocol V2, and Git 2.19: that would ensure that only the minimal amount of data is indeed transferred.

这在Git 2.20(2018年第四季度)中得到了进一步优化,因为在将从源存储库懒惰地水化的部分克隆中,我们通常要避免此对象(本地)存在吗?".在我们故意省略的对象上 当我们创建(部分/稀疏)克隆时.
但是,高速缓存树代码路径(用于从索引中写入树对象)坚持认为该对象存在,即使对于部分检出区域之外的路径也是如此.
该代码已更新以避免这种检查.

That is further optimized in Git 2.20 (Q4 2018), since in a partial clone that will lazily be hydrated from the originating repository, we generally want to avoid "does this object exist (locally)?" on objects that we deliberately omitted when we created the (partial/sparse) clone.
The cache-tree codepath (which is used to write a tree object out of the index) however insisted that the object exists, even for paths that are outside of the partial checkout area.
The code has been updated to avoid such a check.

请参见提交2f215ff (2018年10月9日)通过 Junio C Hamano-gitster-

See commit 2f215ff (09 Oct 2018) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit a08b1d6, 19 Oct 2018)

cache-tree:跳过部分克隆中的一些Blob检查

在部分克隆中,每当发生稀疏校验时,都会验证索引中所有blob的存在,无论它们是否包含在.git/info/sparse-checkout规范中.
这会显着降低性能,因为每当发生惰性抓取时 检查缺少的斑点的存在.

cache-tree: skip some blob checks in partial clone

In a partial clone, whenever a sparse checkout occurs, the existence of all blobs in the index is verified, whether they are included or excluded by the .git/info/sparse-checkout specification.
This significantly degrades performance because a lazy fetch occurs whenever the existence of a missing blob is checked.


在Git 2.24(2019年第四季度)中,已教导cache-tree代码在尝试查看其计算出的树对象是否已存在时不太具有攻击性 存储库.


With Git 2.24 (Q4 2019), the cache-tree code has been taught to be less aggressive in attempting to see if a tree object it computed already exists in the repository.

请参见提交f981ec1 (2019年9月3日)由 Junio C Hamano-gitster-

See commit f981ec1 (03 Sep 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit ae203ba, 07 Oct 2019)

cache-tree:请勿延迟获取暂定树

cache-tree数据结构用于加快HEAD与索引之间的比较,并且当通过樱桃拾取(例如)更新索引时,树对象表示目录中索引的路径是在内核中构造的,以查看对象存储库中是否已存在这样的树对象.

cache-tree: do not lazy-fetch tentative tree

The cache-tree datastructure is used to speed up the comparison between the HEAD and the index, and when the index is updated by a cherry-pick (for example), a tree object that would represent the paths in the index in a directory is constructed in-core, to see if such a tree object exists already in the object store.

当引入延迟获取机制时,我们将其转换为树是否存在?"错误地检查到如果没有,并且我们懒惰地克隆,看看遥控器是否有它",则调用.
由于这项检查的全部目的是通过机会性地记录一个已经存在的树对象来修复高速缓存树,因此我们甚至不应尝试从远程获取一个.

When the lazy-fetch mechanism was introduced, we converted this "does the tree exist?" check into an "if it does not, and if we lazily cloned, see if the remote has it" call by mistake.
Since the whole point of this check is to repair the cache-tree by recording an already existing tree object opportunistically, we shouldn't even try to fetch one from the remote.

传递OBJECT_INFO_SKIP_FETCH_OBJECT标志以确保我们仅检查本地对象存储中是否存在而不触发惰性获取机制.

Pass the OBJECT_INFO_SKIP_FETCH_OBJECT flag to make sure we only check for existence in the local object store without triggering the lazy fetch mechanism.


在Git 2.25(2020年第一季度)中,"git fetch"代码路径具有很大的当我询问是否存在某些东西时不要懒惰地获取丢失的对象"开关.


With Git 2.25 (Q1 2020), "git fetch" codepath had a big "do not lazily fetch missing objects when I ask if something exists" switch.

此问题已通过标记此事物是否存在?"而得到纠正.调用带有如果没有,请不要懒洋洋地获取"标志.

This has been corrected by marking the "does this thing exist?" calls with "if not please do not lazily fetch it" flag.

请参见提交603960b 提交6462d5e (2019年11月5日),由 Jonathan Tan(jhowtan).
(由 Junio C Hamano-gitster-

See commit 603960b, commit e362fad (13 Nov 2019), and commit 6462d5e (05 Nov 2019) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit fce9e83, 01 Dec 2019)

clone :删除fetch_if_missing=0

签名人:Jonathan Tan

clone: remove fetch_if_missing=0

Signed-off-by: Jonathan Tan

提交 6462d5eb9a (获取:删除fetch_if_missing=0", 2019-11-08 )努力从获取机制中删除对fetch_if_missing=0的需要,因此也可以尝试从克隆中删除fetch_if_missing=0,但这这样做似乎是一个错误-当服务器未发送由a直接指向的对象时ref,这应该是错误,而不是延迟获取的触发器.(此获取机制中的情况已通过使用"git clone"(而非"git fetch")的测试进行了覆盖,这就是为什么上述提交未发现的原因错误.)

Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) strove to remove the need for fetch_if_missing=0 from the fetching mechanism, so it is plausible to attempt removing fetch_if_missing=0 from clone as well. But doing so reveals a bug - when the server does not send an object directly pointed to by a ref, this should be an error, not a trigger for a lazy fetch. (This case in the fetching mechanism was covered by a test using "git clone", not "git fetch", which is why the aforementioned commit didn't uncover the bug.)

可以通过在连接检查期间禁止延迟获取来修复该错误.修复此错误,然后从克隆中删除fetch_if_missing.

The bug can be fixed by suppressing lazy-fetching during the connectivity check. Fix this bug, and remove fetch_if_missing from clone.

并且:

promisor-remote :删除fetch_if_missing=0

签名人:Jonathan Tan

promisor-remote: remove fetch_if_missing=0

Signed-off-by: Jonathan Tan

提交 6462d5eb9a (获取:删除fetch_if_missing=0", 2019-11-08 )努力从获取机制中删除对fetch_if_missing=0的需求,因此也有可能尝试从promisor-remote的惰性获取机制中删除fetch_if_missing=0.

Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) strove to remove the need for fetch_if_missing=0 from the fetching mechanism, so it is plausible to attempt removing fetch_if_missing=0 from the lazy-fetching mechanism in promisor-remote as well.

但是这样做揭示了一个错误-当服务器不发送标记对象指向的对象时,会发生无限循环:Git尝试获取丢失的对象,这会导致所有引用的延迟(用于协商),这会导致懒惰地获取丢失的对象,依此类推.
该错误是由于在延迟获取期间不必要使用获取协商程序-初始化后未使用它,但仍对其进行了初始化(这会导致所有引用的取消引用).

But doing so reveals a bug - when the server does not send an object pointed to by a tag object, an infinite loop occurs: Git attempts to fetch the missing object, which causes a deferencing of all refs (for negotiation), which causes a lazy fetch of that missing object, and so on.
This bug is because of unnecessary use of the fetch negotiator during lazy fetching - it is not used after initialization, but it is still initialized (which causes the dereferencing of all refs).

因此,当在获取过程中不使用谈判器时,请避免对其进行初始化.然后,从promisor-remote中删除fetch_if_missing.

Thus, when the negotiator is not used during fetching, refrain from initializing it. Then, remove fetch_if_missing from promisor-remote.


查看更多与" 德里克·斯托利(Derrick Stolee)


See more with "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee

使用 部分克隆功能 进一步加快这些工作流程.
这种组合可以加快数据传输过程,因为您不需要每个可访问的Git对象,而只能下载您需要填充的文件,以填充工作目录的圆锥体

Pairing sparse-checkout with the partial clone feature accelerates these workflows even more.
This combination speeds up the data transfer process since you don’t need every reachable Git object, and instead, can download only those you need to populate your cone of the working directory

$ git clone --filter=blob:none --no-checkout https://github.com/derrickstolee/sparse-checkout-example
Cloning into 'sparse-checkout-example'...
Receiving objects: 100% (373/373), 75.98 KiB | 2.71 MiB/s, done.
Resolving deltas: 100% (23/23), done.

$ cd sparse-checkout-example/

$ git sparse-checkout init --cone
Receiving objects: 100% (3/3), 1.41 KiB | 1.41 MiB/s, done.

$ git sparse-checkout set client/android
Receiving objects: 100% (26/26), 985.91 KiB | 5.76 MiB/s, done.


在Git 2.25.1(2020年2月)之前,has_object_file()说"no"给定了通过pretend_object_file()向系统注册的对象,使其与read_object_file()不一致,从而导致延迟获取尝试获取Promisor遥控器中的一棵空树.


Before Git 2.25.1 (Feb. 2020), has_object_file() said "no" given an object registered to the system via pretend_object_file(), making it inconsistent with read_object_file(), causing lazy fetch to attempt fetching an empty tree from promisor remotes.

请参见讨论.

我试图用

empty_tree=$(git mktree </dev/null)
git init --bare x
git clone --filter=blob:none file://$(pwd)/x y
cd y
echo hi >README
git add README
git commit -m 'nonempty tree'
GIT_TRACE=1 git diff-tree "$empty_tree" HEAD

的确,甚至从不包含Git的存储库来看,Git似乎都在为空树提供服务.

and indeed, it looks like Git serves the empty tree even from repositories that don't contain it.

请参见提交9c8a294 (2020年1月2日)由 Junio C Hamano-gitster-

See commit 9c8a294 (02 Jan 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit e26bd14, 22 Jan 2020)

sha1-file :删除OBJECT_INFO_SKIP_CACHED

签名人:Jonathan Tan

sha1-file: remove OBJECT_INFO_SKIP_CACHED

Signed-off-by: Jonathan Tan

在部分克隆中,如果用户提供了空树的哈希值(" </dev/null-对于SHA-1,这是要求对该对象进行解析的命令的4b825d ...),例如:

In a partial clone, if a user provides the hash of the empty tree ("git mktree</dev/null" - for SHA-1, this is 4b825d...) to a command which requires that that object be parsed, for example:

git diff-tree 4b825d <a non-empty tree>

然后Git会不必要地懒惰地获取空树,因为对该对象的解析会调用repo_has_object_file(),而对空树没有特殊情况.

then Git will lazily fetch the empty tree, unnecessarily, because parsing of that object invokes repo_has_object_file(), which does not special-case the empty tree.

相反,请教repo_has_object_file()咨询find_cached_object()(处理空树),从而使其与object-store-accessing其余功能保持一致.
代价是repo_has_object_file()现在将在每次调用时都需要oideq,但这与文件系统查找或仍然需要的包索引搜索相比是微不足道的. (而且如果find_cached_object()由于先前对pretend_object_file()的调用而需要做更多的事情,那么我们是否提供缓存的对象也必须保持一致.)

Instead, teach repo_has_object_file() to consult find_cached_object() (which handles the empty tree), thus bringing it in line with the rest of the object-store-accessing functions.
A cost is that repo_has_object_file() will now need to oideq upon each invocation, but that is trivial compared to the filesystem lookup or the pack index search required anyway. (And if find_cached_object() needs to do more because of previous invocations to pretend_object_file(), all the more reason to be consistent in whether we present cached objects.)

作为历史注释,现在称为repo_read_object_file()的功能在 346245a1bb (硬编码空树对象",2008-02-13,Git v1.5.5-rc0-合并),现在称为oid_object_info()的函数在 c4d9986f5f ("sha1_object_info:也检查cached_object存储",2011-02-07,Git v1.7.4.1).

As a historical note, the function now known as repo_read_object_file() was taught the empty tree in 346245a1bb ("hard-code the empty tree object", 2008-02-13, Git v1.5.5-rc0 -- merge), and the function now known as oid_object_info() was taught the empty tree in c4d9986f5f ("sha1_object_info: examine cached_object store too", 2011-02-07, Git v1.7.4.1).

repo_has_object_file()可能由于疏忽而从未更新过.
标记OBJECT_INFO_SKIP_CACHED,稍后在更多标志",2017年6月26日,Git v2.14.0-rc0),并用于 e83e71c5e1 ("sha1_file:refactor has_sha1_file_with_flags",2017-06-26,Git v2.14.0-rc0)被引入以保留空树处理中的这种差异,但现在可以将其删除.

repo_has_object_file() was never updated, perhaps due to oversight.
The flag OBJECT_INFO_SKIP_CACHED, introduced later in dfdd4afcf9 ("sha1_file: teach sha1_object_info_extended more flags", 2017-06-26, Git v2.14.0-rc0) and used in e83e71c5e1 ("sha1_file: refactor has_sha1_file_with_flags", 2017-06-26, Git v2.14.0-rc0), was introduced to preserve this difference in empty-tree handling, but now it can be removed.


Git 2.25.1还将警告程序员有关pretend_object_file()的信息,该信息允许代码临时使用内核对象.


Git 2.25.1 will also warn programmers about pretend_object_file() that allows the code to tentatively use in-core objects.

请参见提交60440d7 (2020年1月4日)由 Junio C Hamano-gitster-

See commit 60440d7 (04 Jan 2020) by Jonathan Nieder (artagnon).
(Merged by Junio C Hamano -- gitster -- in commit b486d2e, 12 Feb 2020)

sha1-file :记录如何使用pretend_object_file

灵感来自:Junio C Hamano
签名人:乔纳森·尼德

sha1-file: document how to use pretend_object_file

Inspired-by: Junio C Hamano
Signed-off-by: Jonathan Nieder

像内存替代品一样,pretend_object_file包含一个陷阱,用于不敏感的对象:粗心的调用者可以使用它来创建对磁盘对象中不存在的对象的引用商店.

Like in-memory alternates, pretend_object_file contains a trap for the unwary: careless callers can use it to create references to an object that does not exist in the on-disk object store.

添加注释,说明如何使用该功能而不会冒此类问题的风险.

Add a comment documenting how to use the function without risking such problems.

当前唯一的调用方是怪,它使用pretend_object_file来创建表示工作树状态的内存中提交.在讨论如何安全地在"git merge"之类的操作中使用此功能时注意到了这一点,与怪怪不同的是,该操作不是只读的.

The only current caller is blame, which uses pretend_object_file to create an in-memory commit representing the working tree state. Noticed during a discussion of how to safely use this function in operations like "git merge" which, unlike blame, are not read-only.

所以此评论现在 :

/*
 * Add an object file to the in-memory object store, without writing it
 * to disk.
 *
 * Callers are responsible for calling write_object_file to record the
 * object in persistent storage before writing any other new objects
 * that reference it.
 */
int pretend_object_file(void *, unsigned long, enum object_type,
            struct object_id *oid);


Git 2.25.1(2020年2月)包括一个Futureproofing,以确保测试不依赖于当前的实现细节.


Git 2.25.1 (Feb. 2020) includes a Futureproofing for making sure a test do not depend on the current implementation detail.

请参见提交b54128b (2020年1月13日)通过 Junio C Hamano-gitster-

See commit b54128b (13 Jan 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 3f7553a, 12 Feb 2020)

t5616 :对增量基础更改具有鲁棒性

签名人:Jonathan Tan

t5616: make robust to delta base change

Signed-off-by: Jonathan Tan

提交 6462d5eb9a (获取:删除fetch_if_missing=0", 2019-11-08 )包含一个测试,该测试依赖于必须懒惰地获取blob的增量基数,但假定要获取的树(作为测试的一部分)作为非增量对象发送.
这种假设将来可能不成立.例如,对象哈希长度的变化可能会导致树以增量形式发送.

Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) contains a test that relies on having to lazily fetch the delta base of a blob, but assumes that the tree being fetched (as part of the test) is sent as a non-delta object.
This assumption may not hold in the future; for example, a change in the length of the object hash might result in the tree being sent as a delta instead.

通过依赖于懒惰地获取树的增量基数,并且不对是否将斑点作为增量或非增量发送进行假设,从而使测试更加健壮.

Make the test more robust by relying on having to lazily fetch the delta base of the tree instead, and by making no assumptions on whether the blobs are sent as delta or non-delta.


Git 2.25.2(2020年3月)修复了最近的更改所揭示的一个错误,该错误使协议v2成为默认协议.


Git 2.25.2 (March 2020) fixes a bug revealed by a recent change to make the protocol v2 the default.

请参见提交3e96c66 德里克·斯托利(derrickstolee).
(由 Junio C Hamano-gitster-

See commit 3e96c66, commit d0badf8 (21 Feb 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 444cff6, 02 Mar 2020)

partial-clone :在查找对象时避免获取

签名人:Derrick Stolee

partial-clone: avoid fetching when looking for objects

Signed-off-by: Derrick Stolee

在测试部分克隆时,我注意到一些奇怪的行为.我正在测试一种运行'git init'的方法,然后手动配置该远程服务器以进行部分克隆,然后运行'git fetch'.
令人惊讶的是,我看到"git fetch"过程开始要求服务器进行多轮打包文件下载!稍微调整一下情况后,我发现我可能导致遥控器挂断并出现错误.

While testing partial clone, I noticed some odd behavior. I was testing a way of running 'git init', followed by manually configuring the remote for partial clone, and then running 'git fetch'.
Astonishingly, I saw the 'git fetch' process start asking the server for multiple rounds of pack-file downloads! When tweaking the situation a little more, I discovered that I could cause the remote to hang up with an error.

添加两个测试,以证明这两个问题.

Add two tests that demonstrate these two issues.

在第一个测试中,我们发现从以前没有任何标签的存储库中使用blob过滤器进行获取时,'

In the first test, we find that when fetching with blob filters from a repository that previously did not have any tags, the 'git fetch --tags origin' command fails because the server sends "multiple filter-specs cannot be combined". This only happens when using protocol v2.

在第二个测试中,我们看到" git fetch 来源"多次参考更新的请求会导致多次打包文件下载.
这一定是由于Git尝试对ref指向的对象进行故障处理.使此问题特别令人讨厌的是,这是通过do_oid_object_info_extended()方法进行的,因此协商中没有必须".
这导致远程从每个新引用发送每个可到达的提交和树,从而提供了二次数据传输!如果我们还原 6462d5eb9a (获取:删除fetch_if_missing=0, 2019-11,则此测试已修复-05,Git v2.25.0-rc0),但还原会导致其他测试失败.
真正的修复需要更多注意.

In the second test, we see that a 'git fetch origin' request with several ref updates results in multiple pack-file downloads.
This must be due to Git trying to fault-in the objects pointed by the refs. What makes this matter particularly nasty is that this goes through the do_oid_object_info_extended() method, so there are no "haves" in the negotiation.
This leads the remote to send every reachable commit and tree from each new ref, providing a quadratic amount of data transfer! This test is fixed if we revert 6462d5eb9a (fetch: remove fetch_if_missing=0, 2019-11-05, Git v2.25.0-rc0), but that revert causes other test failures.
The real fix will need more care.

修复:

使用部分克隆时,在 builtin/fetch.c 检查每个远程标记,以查看其对象是否在本地也存在.不能期望对象在本地存在,但是如果对象不存在,则此函数仍会触发延迟获取.当我们要求提交时,这可能会非常昂贵,因为我们已完全从不存在的对象的上下文中删除,因此在请求中不提供必须".

When using partial clone, find_non_local_tags() in builtin/fetch.c checks each remote tag to see if its object also exists locally. There is no expectation that the object exist locally, but this function nevertheless triggers a lazy fetch if the object does not exist. This can be extremely expensive when asking for a commit, as we are completely removed from the context of the non-existent object and thus supply no "haves" in the request.

6462d5eb9a (fetch:删除fetch_if_missing=0, 2019-11-05 ,Git v2.25.0-rc0,Git v2.25.0-rc0)删除了一个全局变量,该全局变量阻止了这些取而代之的是位标志.但是,某些对象存在检查并未更新为使用此标志.

6462d5eb9a (fetch: remove fetch_if_missing=0, 2019-11-05, Git v2.25.0-rc0, , Git v2.25.0-rc0) removed a global variable that prevented these fetches in favor of a bitflag. However, some object existence checks were not updated to use this flag.

更新find_non_local_tags()以在OBJECT_INFO_QUICK之外使用OBJECT_INFO_SKIP_FETCH_OBJECT.
_QUICK选项仅阻止重新准备打包文件结构.当我们期望由于更新的引用而导致对象不存在时,我们在提供_SKIP_FETCH_OBJECT时必须格外小心.

Update find_non_local_tags() to use OBJECT_INFO_SKIP_FETCH_OBJECT in addition to OBJECT_INFO_QUICK.
The _QUICK option only prevents repreparing the pack-file structures. We need to be extremely careful about supplying _SKIP_FETCH_OBJECT when we expect an object to not exist due to updated refs.

这可以解决t5616-partial-clone.sh.


"git clone --single-branch"自动跟踪标签的逻辑不小心避免延迟获取不必要的标签,该问题已在Git 2.27(2020年第二季度)中得到纠正,


The logic to auto-follow tags by "git clone --single-branch" was not careful to avoid lazy-fetching unnecessary tags, which has been corrected with Git 2.27 (Q2 2020),

请参见提交167a575 (2020年4月1日)由 Junio C Hamano-gitster-

See commit 167a575 (01 Apr 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 3ea2b46, 22 Apr 2020)

clone :在跟踪标签时使用快速"查询

签名人:杰夫·金

clone: use "quick" lookup while following tags

Signed-off-by: Jeff King

使用--single-branch进行克隆时,我们实现git fetch的通常的标记跟随行为,捕获指向我们本地对象的所有标记对象.

When cloning with --single-branch, we implement git fetch's usual tag-following behavior, grabbing any tag objects that point to objects we have locally.

但是,当我们是部分克隆时,我们的has_object_file()检查实际上将延迟获取每个标签.

When we're a partial clone, though, our has_object_file() check will actually lazy-fetch each tag.

这不仅违背了--single-branch的目的,而且它的运行速度非常慢,有可能为每个标签启动新的获取操作.
对于浅克隆(这意味着--single-branch),情况甚至更糟,因为即使是彼此超集的标签也将被单独获取.

That not only defeats the purpose of --single-branch, but it does it incredibly slowly, potentially kicking off a new fetch for each tag.
This is even worse for a shallow clone, which implies --single-branch, because even tags which are supersets of each other will be fetched individually.

我们可以通过将OBJECT_INFO_SKIP_FETCH_OBJECT传递给呼叫来解决此问题,这就是 git fetch 在这种情况下.

We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call, which is what git fetch does in this case.

同样,让我们​​包括OBJECT_INFO_QUICK,,因为这就是 git fetch 的作用.
5827a03545 中进行讨论(获取:使用快速" has_sha1_file作为标记之后,2016年10月13日,Git v2.10.2),但此处的权衡将更加适用,因为克隆极不可能与重新包装我们新创建的存储库的另一个进程竞争.

Likewise, let's include OBJECT_INFO_QUICK, as that's what git fetch does.
The rationale is discussed in 5827a03545 (fetch: use "quick" has_sha1_file for tag following, 2016-10-13, Git v2.10.2), but here the tradeoff would apply even more so because clone is very unlikely to be racing with another process repacking our newly-created repository.

即使在非部分情况下,这也可以提供非常小的加速,因为我们避免为每个标签调用reprepare_packed_git()(尽管在实践中,我们只有一个packfile,所以reprepare应该是非常便宜).

This may provide a very small speedup even in the non-partial case case, as we'd avoid calling reprepare_packed_git() for each tag (though in practice, we'd only have a single packfile, so that reprepare should be quite cheap).


在Git 2.27(2020年第二季度)之前,使用联机协议版本2通过"git://"和"ssh://"协议为"git fetch"客户端提供服务时,客户端需要在服务器端进行调试提出后续要求,例如自动关注标签.


Before Git 2.27 (Q2 2020), serving a "git fetch" client over "git://" and "ssh://" protocols using the on-wire protocol version 2 was buggy on the server end when the client needs to make a follow-up request to e.g. auto-follow tags.

请参见提交08450ef (2020年5月8日)通过 Junio C Hamano-gitster-

See commit 08450ef (08 May 2020) by Christian Couder (chriscool).
(Merged by Junio C Hamano -- gitster -- in commit a012588, 13 May 2020)

upload-pack :为每个v2提取命令清除filter_options

帮助:Derrick Stolee
帮助人:杰夫·金
帮助人:泰勒·布劳
签名人:克里斯蒂安·库德

upload-pack: clear filter_options for each v2 fetch command

Helped-by: Derrick Stolee
Helped-by: Jeff King
Helped-by: Taylor Blau
Signed-off-by: Christian Couder

由于协议v2的请求/响应模型,有时在同一进程中两次调用upload_pack_v2()函数,而在'list_objects_filter_options filter_options'声明为静态="https://github.com/git/git/blob/08450ef7918ef2f24bc3bf6afcd1782aa0677015/upload-pack.c" rel ="nofollow noreferrer"> upload-pack.c '.

Because of the request/response model of protocol v2, the upload_pack_v2() function is sometimes called twice in the same process, while 'struct list_objects_filter_options filter_options' was declared as static at the beginning of 'upload-pack.c'.

这使得process_args()调用的list_objects_filter_die_if_populated()中的检查由于第二次调用upload_pack_v2()而失败,因为filter_options已经是第一次填充.

This made the check in list_objects_filter_die_if_populated(), which is called by process_args(), fail the second time upload_pack_v2() is called, as filter_options had already been populated the first time.

要解决此问题,filter_options不再是静态的.现在,它直接由upload_pack()拥有.现在它也是'struct upload_pack_data'的一部分,因此它由upload_pack_v2()间接拥有.

To fix that, filter_options is not static any more. It's now owned directly by upload_pack(). It's now also part of 'struct upload_pack_data', so that it's owned indirectly by upload_pack_v2().

从长远来看,目标是也让upload_pack()使用'struct upload_pack_data',因此,将filter_options添加到此struct比将其直接拥有给upload_pack_v2()更有意义.

In the long term, the goal is to also have upload_pack() use 'struct upload_pack_data', so adding filter_options to this struct makes more sense than to have it owned directly by upload_pack_v2().

这修复了 d0badf8797 (":演示部分获取中的错误",2020年2月21日,Git v2.26.0-rc0-合并列在第8批)中.

This fixes the first of the 2 bugs documented by d0badf8797 ("partial-clone: demonstrate bugs in partial fetch", 2020-02-21, Git v2.26.0-rc0 -- merge listed in batch #8).

这篇关于浅而稀疏的GIT存储库克隆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆