带排除的 Git 稀疏结帐 [英] Git sparse checkout with exclusion

查看:30
本文介绍了带排除的 Git 稀疏结帐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据这个线程,应该实现 Git 的 sparse-checkout 功能 中的排除.是吗?

假设我有以下结构:

论文/文件/...演示/演示文稿/heavy_presentation演示文稿/...

现在我想从结帐中排除 presentations/heavy_presentation,而将其余部分留在结帐中.我还没有设法让它运行.什么是正确的语法?

解决方案

在 Git 2.25(2020 年第一季度)中,稀疏检出工作树的管理获得了专用的 "sparse-checkout"命令.

首先,这是一个扩展示例,从使用 --filter 选项的快速克隆开始:

git clone --filter=blob:none --no-checkout https://github.com/git/gitcd gitgit sparse-checkout init --cone# 设置 git config core.sparseCheckoutCone truegit read-tree -mu HEAD

使用锥选项(详细/记录如下)意味着您的 .gitinfosparse-checkout 将包含以下开头的模式:

/*!/*/

含义:只有顶级文件,没有子文件夹.
如果你不想要顶级文件,你需要避免锥形模式:

# 在 .git/config.worktree 中禁用锥形模式git 配置 core.sparseCheckoutCone 假# 删除 .gitinfosparse-checkoutgit sparse-checkout 禁用# 添加预期的模式,只包含一个没有顶级文件的子文件夹:git 稀疏结帐集/mySubFolder/# 只用正确的文件填充工作树:git read-tree -mu HEAD


详情:

(请参阅使用 sparse-checkout" 将您的 monorepo 缩小到规模德里克·斯托利)

因此,不仅排除子文件夹有效,而且使用cone"会更快稀疏结帐模式(使用 Git 2.25).

请参阅 commit 761e3d2(2019 年 12 月 20 日),作者 Ed Maste (emaste).
参见 commit 190a65f(2019 年 12 月 13 日)和 提交 cff4e91, 提交 416adc869a href="https://github.com/git/git/commit/f75a69f88099689"="https://github.com/git/git/commit/fb10ca5b54362e6f860e1e9049e03924fcf5f05b" rel="nofollow noreferrer">提交 fb10ca5, 提交99dfa6f 提交e091228 , 提交e9de487noreferd="cdderel4f">noreferrd="cdde487", 提交 eb42fec, 提交 af09ce2, 提交 879321ecommit 72918c1, commit 7bffca9, commit f6039a9, 提交 d89f09c, commit 94c0956(2019 年 11 月 21 日)来自 Derrick Stolee (derrickstolee).
请参阅 commit e6152e3(2019 年 11 月 21 日),作者 Jeff Hostetler (Jeff-Hostetler).
(由 Junio C Hamano 合并 -- gitster --commit bd72a08,2019 年 12 月 25 日)

<块引用>

sparse-checkout:添加'cone'模式

签字人:Derrick Stolee

<块引用>

随着索引中模式数量和条目数量的增加,稀疏结账功能可以具有二次性能.
如果有 1,000 个模式和 1,000,000 个条目,这个时间可能非常重要.

创建一个新的布尔配置选项 core.sparseCheckoutCone,以表明我们希望稀疏结帐文件包含一组更有限的模式.
这是一个独立于 core.sparseCheckout 的配置设置,以避免通过引入三态选项来破坏旧客户端.

config 手册页 包括:

<块引用>

`core.sparseCheckoutCone`:

<块引用>

启用圆锥模式";稀疏结帐功能.
当 sparse-checkout 文件包含一组有限的模式时,这种模式可提供显着的性能优势.

稀疏结帐 手册页 详细信息:

<块引用>

锥形图案组

<块引用>

完整的模式集允许任意模式匹配和复杂的包含/排除规则.
在更新索引时,这些可能导致 O(N*M) 模式匹配,其中 N 是模式的数量,M 是数量索引中的路径.为了解决这个性能问题,启用 core.spareCheckoutCone 时允许使用更受限制的模式集.

锥体模式集中接受的模式是:

  1. 递归:包含目录中的所有路径.
  2. 父级:包含目录中的所有文件.

除了以上两种模式,我们还期望根目录下的所有文件都包含在内.如果添加了递归模式,则将所有前导目录添加为父模式.

默认情况下,当运行 git sparse-checkout init 时,根目录被添加为父模式.此时,sparse-checkout 文件包含以下模式:

/*!/*/

这表示包括根目录中的所有内容,但不包含根目录以下两个级别的内容."
如果我们然后添加文件夹A/B/C 作为递归模式,文件夹AA/B 被添加为父模式.
生成的稀疏结帐文件现在是

/*!/*//一种/!/一种/*//A/B/!/A/B/*//A/B/C/

在这里,顺序很重要,因此消极模式被积极模式覆盖出现在文件下方的模式.

如果 core.sparseCheckoutCone=true,那么 Git 将解析稀疏结帐文件,期望这些类型的模式.
如果模式不匹配,Git 会发出警告.
如果模式确实与预期的格式匹配,那么 Git 将使用更快的哈希-基于算法来计算 sparse-checkout 中的包含.

所以:

<块引用>

sparse-checkout:初始化并设置锥形模式

帮助:Eric Wong
帮助:约翰内斯·辛德林
签字人:Derrick Stolee

<块引用>

为了使锥形模式集易于使用,更新'git sparse-checkout (init|set)'.

将 '--cone' 标志添加到 'git sparse-checkout init' 设置配置选项'core.sparseCheckoutCone=true'.

运行时'git sparse-checkout set' 在锥形模式下,用户只需要提供递归文件夹匹配列表.Git 会自动为前导目录添加必要的父匹配项.


请注意,--cone 选项仅在 Git 2.26(2020 年第一季度)中记录
(由 Junio C Hamano 合并 -- gitster --commit ea46d90,2020 年 2 月 5 日)

<块引用>

doc:sparse-checkout:提及--cone选项

签字人:Matheus Tavares
确认:Derrick Stolee

<块引用>

af09ce2 ("sparse-checkout:初始化并设置为锥形模式",2019 年 11 月 21 日,Git v2.25.0-rc0 -- git sparse-checkout init'.

代码中记录它>git 稀疏结帐:

包括:

<块引用>

当提供 --cone 时,core.sparseCheckoutCone 设置也被设置,允许使用有限的模式集获得更好的性能.

(上面介绍的模式集",在本答案的CONE PATTERN SET"部分)


这个新的锥体"有多快?模式是?

<块引用>

sparse-checkout:对锥体使用哈希图模式

帮助:Eric Wong
帮助:约翰内斯·辛德林
签字人:Derrick Stolee

<块引用>

锥模式"允许的父模式和递归模式sparse-checkout 中的选项有足够的限制,我们可以避免使用正则表达式解析.一切都基于前缀匹配,因此我们可以使用哈希集来存储稀疏结帐文件中的前缀.在检查路径时,我们可以从路径中去除路径条目并检查哈希集是否完全匹配.

作为测试,我为 Linux 存储库创建了一个锥形模式稀疏结帐文件,该文件实际上包含每个文件.这是通过获取 Linux 存储库中的每个文件夹并在此处创建模式对来构建的:

/$folder/!/$文件夹/*/

<块引用>

这导致了一个包含 8,296 个模式的稀疏结帐文件.
在此文件上运行 'git read-tree -mu HEAD' 具有以下性能:

 core.sparseCheckout=false: 0.21 s (0.00 s)core.sparseCheckout=true:3.75 秒(3.50 秒)core.sparseCheckoutCone=true:0.23 秒(0.01 秒)

根据trace2性能跟踪,上面括号中的时间对应于第一次clear_ce_flags()调用所花费的时间.

虽然这个例子是人为的,但它展示了这些模式如何减慢稀疏结账功能.

还有:

<块引用>

sparse-checkout:尊重core.ignoreCase锥形模式

签字人:Derrick Stolee

<块引用>

当用户在锥形模式下使用稀疏结帐功能时,他们会使用git sparse-checkout set <目录2>..."或使用--stdin";在标准输入上逐行提供目录.
这种行为自然看起来很像用户键入git add <;目录1><目录2>..."

如果启用了 core.ignoreCase,则git add";将使用不区分大小写的匹配来匹配输入.
sparse-checkout 功能执行相同的操作.

unpack_trees() 期间更新跳过工作树位时执行不区分大小写的检查.这是通过将哈希算法和哈希图比较方法更改为可选地使用不区分大小写的方法来实现的.

启用此功能后,散列算法的性能开销很小.
为了找出最坏的情况,以下是在具有深层目录结构的存储库上运行的:

git ls-tree -d -r --name-only HEAD |git sparse-checkout set --stdin

'set' 命令在 core.ignoreCase 禁用或启用的情况下计时.
对于历史悠久的回购,数字是

core.ignoreCase=false: 62score.ignoreCase=true: 74s (+19.3%)

为了可重复性,Linux 内核存储库上的等效测试具有以下数字:

core.ignoreCase=false: 3.1score.ignoreCase=true: 3.6s (+16%)

现在,这不是一个完全公平的比较,因为大多数用户会使用更浅的目录来定义他们的稀疏锥,并且性能改进来自 eb42feca97(解包树:锥形模式下的散列更少"2019-11-21,Git 2 可以删除散列的大部分成本.大多数散列成本为 25-rc0)要进行更真实的测试,请删除-r";来自 ls-tree 命令只存储第一级目录.
在这种情况下,Linux 内核存储库在每种情况下需要 0.2-0.25 秒,深度存储库在每种情况下需要 1 秒,正负 0.05 秒.

因此,我们可以证明此更改的成本,但这对任何合理的稀疏结帐锥都不太可能重要.


使用 Git 2.25(2020 年第一季度),git sparse-checkout 列表"当cone"出现时,子命令学会了以更简洁的形式给出其输出.模式有效.

参见 commit 4fd683bcommit de11951(2019 年 12 月 30 日)由 德里克·斯托利 (derrickstolee).
(由 Junio C Hamano 合并 -- gitster --commit c20d4fd,2020 年 1 月 6 日)

<块引用>

sparse-checkout:以锥体形式列出目录模式

签字人:Derrick Stolee

<块引用>

core.sparseCheckoutCone 启用时,'git sparse-checkout set' 命令将目录列表作为输入,然后创建稀疏结帐模式的有序列表,以便递归包含这些目录以及父目录中的所有同级条目也包括在内.
列出模式比目录本身更不友好.

在锥形模式下,只要模式与预期的锥形模式类型匹配,就更改 'git sparse-checkout list' 只显示创建模式的目录.

通过此更改,以下管道命令不会更改工作目录:

git sparse-checkout list |git sparse-checkout set --stdin

唯一不起作用的情况是 core.sparseCheckoutConetrue,但稀疏结帐文件包含的模式与锥体的预期模式类型不匹配模式.


最近在此版本中添加的代码在稀疏锥模式下移动到索引中同一目录中的条目之外的条目没有计算错误跳过的条目数,已更正,使用Git2.25.1(2020 年 2 月).

请参阅 commit 7210ca4(2020 年 1 月 27 日 https="https://a href="https://github.com/a href="https://github.com/git/2020/01/2020)://github.com/gitster" rel="nofollow noreferrer">Junio C Hamano (gitster).
请参阅 commit 4c6c797(2020 年 1 月 10 日)https://Derrick Stolee 通过 GitGitGadget (``).
(由 Junio C Hamano 合并 -- gitster --commit 043426c,2020 年 1 月 30 日>

unpack-trees:正确计算结果计数

报告人:约翰内斯·辛德林
签字人:Derrick Stolee

<块引用>

clear_ce_flags_dir() 方法处理公共目录中的缓存条目.返回的 int 是该目录处理的缓存条目数.
在锥形模式下使用稀疏结帐功能时,我们可以跳过完全包含或完全排除目录中的条目的模式匹配.

eb42feca(解压树:锥形模式下的哈希更少",2019 年 11 月 21 日,Git v2.25.0-rc0 -- 列在 batch #0) 中引入了此性能功能.旧机制依赖于调用 clear_ce_flags_1() 返回的计数,但新机制通过减去cache_end"来计算行数.来自缓存"找到范围的大小.
但是,该等式是错误的,因为它除以 sizeof(struct cache_entry *).这不是指针算法的工作原理!

为 2.25.0 版本准备的 Git for Windows 覆盖版本发现此问题并显示警告:

指针差异,如`cache_end` - 缓存,自动按指向类型(struct `cache_entry` *)的大小(8 字节)缩小.很可能,除以 sizeof(struct `cache_entry` *) 是无关紧要的并且应该被淘汰.

这个警告是正确的.

这给我们留下了一个问题:这是怎么做到的?"

这种不正确的指针算法出现的问题是一个仅性能错误,而且是一个非常轻微的错误.
由于 clear_ce_flags_dir() 返回的条目数减少了 8 倍,clear_ce_flags_1() 中的循环将重新处理这些目录中的条目.

通过将全局计数器插入 unpack-tree.c 并使用 trace2_data_intmax() 跟踪它们(在私人更改中,用于测试),我能够看到计数clear_ce_flags_1() 内部的循环处理一个条目的次数以及调用 clear_ce_flags_dir() 的次数.
随着当前的变化,这些中的每一个都至少减少了 8 倍.
当多级目录重复时,会出现大于 8 的因子.

具体来说,在 Linux 内核 repo 中,命令

git sparse-checkout 设置许可证

将工作目录限制为仅根目录和 LICENSES 目录中的文件.
以下是测量的计数:

clear_ce_flags_1 循环块:

之前:11,520之后:1,621

clear_ce_flags_dir 调用:

之前:7,048之后:606

虽然这些数字很惊人,但在每种情况下 clear_ce_flags_1() 花费的时间都在 1 毫秒以下,因此无法用端到端时间来衡量改进.


在 Git 2.26(2020 年第 1 季度)中,稀疏结帐功能中的一些粗糙边缘,尤其是在锥形模式周围,已得到清理.

参见commit f998a3fcommit d2e65f4, commit d2e65f4">提交 e53ffe2、提交 e55682e提交bd64de4 提交 d585f0ecommit 9abc60f(2020 年 1 月 31 日)和 norel3ref="https://github.com/git/git/commit/9e6d3e64175713bc0007f3012提交 9e6d3e6提交 41de0c6commit 47dbf10, 提交"nofollow noreferrer">提交 3c754062、2、提交3a6href="https://github.com/git/git/commit/522e6417487cc5c3f2f6d49c8f63554af63d8eda" rel="nofollow noreferrer">commit 522e641(2020 年 1 月 24 日)来自 Derrick Stolee (derrickstolee).
请参阅 commit 7aa9ef2(2020 年 1 月 24 日)杰夫·金 (peff).
(由 Junio C Hamano 合并 -- gitster --commit 433b8aa,2020 年 2 月 14 日)

<块引用>

sparse-checkout:修复锥模式行为不匹配

报告人:芬恩·布莱恩特
签字人:Derrick Stolee

<块引用>

特殊锥体模式"的用意在稀疏结帐功能中,始终匹配与禁用锥形模式时相同的稀疏结帐文件匹配的相同模式.

当文件路径被赋予git sparse-checkout 设置"在锥形模式下,锥形模式不正确地将文件匹配为递归路径.
在设置跳过工作树位时,文件不期望 MATCHED_RECURSIVE 响应,因此这些被排除在匹配的锥体之外.

通过检查 MATCHED_RECURSIVEMATCHED 并添加防止回归的测试来修复此错误.

文档 现在包括:

<块引用>

当启用 core.sparseCheckoutCone 时,输入列表被认为是一个目录列表而不是稀疏结帐模式.
该命令将模式写入 sparse-checkout 文件,以包括这些目录中包含的所有文件(递归)以及作为祖先目录兄弟的文件.
输入格式与 git ls-tree --name-only 的输出匹配.这包括将以双引号 (") 开头的路径名解释为 C 风格的带引号的字符串.


在 Git 2.26(2020 年第一季度)中,git sparse-checkout"学习了一个新的add";子命令.

参见 commit 6c11c6a(2020 年 2 月 20 日),以及 commit ef07659, 提交" rel="nofollow noreferrer">提交 2631dc8, 04,c04bcommit 6fb705a(2020 年 2 月 11 日)来自 Derrick Stolee (derrickstolee).
(由 Junio C Hamano 合并 -- gitster --commit f4d7dfc,2020 年 3 月 5 日)

<块引用>

sparse-checkout:创建添加"子命令

签字人:Derrick Stolee

<块引用>

使用稀疏结账功能时,用户可能希望逐步增加他们的稀疏结账模式集.
允许使用新的添加"子命令添加模式.

这与 'set' 子命令没有太大区别,因为我们仍然希望允许 '--stdin' 选项并将输入解释为在锥形模式下时的目录,否则解释为模式.

在锥体模式下,我们正在增大锥体.
A/B 已经是锥体中的一个目录时,这实际上可能会减少添加目录 A 时的模式集.测试不同的情况:兄弟姐妹、父母、祖先.

当不在锥形模式下时,我们只能假设模式应该附加到稀疏结账文件中.

还有:

<块引用>

sparse-checkout:使用 Windows 路径

签字人:Derrick Stolee

<块引用>

使用 Windows 时,用户可以运行 'git sparse-checkout 设置 ABC' 以将 Unix 风格的路径 A/B/C` 添加到它们的稀疏结帐模式中.

在我们将字符串A/B/C"添加到递归哈希集之前,对输入路径进行规范化会将反斜杠转换为斜杠.


很长时间以来,稀疏结帐模式已被禁止排除所有路径,留下空的工作树.

在 Git 2.27(2020 年第二季度)中,此限制已被解除.

参见 commit ace224a(2020 年 5 月 4 日)Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e9acbd6, 08 May 2020)

<块引用>

sparse-checkout: stop blocking empty workdirs

Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee

<块引用>

Remove the error condition when updating the sparse-checkout leaves an empty working directory.

This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).

The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[unpack-trees.c](https://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).

With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of

/*!/*/

which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail.

Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode.

According to this thread, exclusion in Git's sparse-checkout feature is supposed to be implemented. Is it?

Assume that I have the following structure:

papers/
papers/...
presentations/
presentations/heavy_presentation
presentations/...

Now I want to exclude presentations/heavy_presentation from the checkout, while leaving the rest in the checkout. I haven't managed to get this running. What's the right syntax for this?

解决方案

With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command.

First, here is an extended example, starting with a fast clone using a --filter option:

git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD

Using the cone option (detailed/documented below) means your .gitinfosparse-checkout will include patterns starting with:

/*
!/*/

Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:

# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false

# remove .gitinfosparse-checkout
git sparse-checkout disable

# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/

# populate working-tree with only the right files:
git read-tree -mu HEAD


In details:

(See more at "Bring your monorepo down to size with sparse-checkout" from Derrick Stolee)

So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).

See commit 761e3d2 (20 Dec 2019) by Ed Maste (emaste).
See commit 190a65f (13 Dec 2019), and commit cff4e91, commit 416adc8, commit f75a69f, commit fb10ca5, commit 99dfa6f, commit e091228, commit e9de487, commit 4dcd4de, commit eb42fec, commit af09ce2, commit 96cc8ab, commit 879321e, commit 72918c1, commit 7bffca9, commit f6039a9, commit d89f09c, commit bab3c35, commit 94c0956 (21 Nov 2019) by Derrick Stolee (derrickstolee).
See commit e6152e3 (21 Nov 2019) by Jeff Hostetler (Jeff-Hostetler).
(Merged by Junio C Hamano -- gitster -- in commit bd72a08, 25 Dec 2019)

sparse-checkout: add 'cone' mode

Signed-off-by: Derrick Stolee

The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.

Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting from core.sparseCheckout to avoid breaking older clients by introducing a tri-state option.

The config man page includes:

`core.sparseCheckoutCone`:

Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.

The git sparse-checkout man page details:

CONE PATTERN SET

The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result in O(N*M) pattern matches when updating the index, where N is the number of patterns and M is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed when core.spareCheckoutCone is enabled.

The accepted patterns in the cone pattern set are:

  1. Recursive: All paths inside a directory are included.
  2. Parent: All files immediately inside a directory are included.

In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.

By default, when running git sparse-checkout init, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:

/*
!/*/

This says "include everything in root, but nothing two levels below root."
If we then add the folder A/B/C as a recursive pattern, the folders A and A/B are added as parent patterns.
The resulting sparse-checkout file is now

/*
!/*/
/A/
!/A/*/
/A/B/
!/A/B/*/
/A/B/C/

Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.

If core.sparseCheckoutCone=true, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in the sparse-checkout.

So:

sparse-checkout: init and set in cone mode

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.


Note, the --cone option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster -- in commit ea46d90, 05 Feb 2020)

doc: sparse-checkout: mention --cone option

Signed-off-by: Matheus Tavares
Acked-by: Derrick Stolee

In af09ce2 ("sparse-checkout: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone' option was added to 'git sparse-checkout init'.

Document it in git sparse-checkout:

That includes:

When --cone is provided, the core.sparseCheckoutCone setting is also set, allowing for better performance with a limited set of patterns.

("set of patterns" presented above, in the "CONE PATTERN SET" section of this answer)


How much faster this new "cone" mode would be?

sparse-checkout: use hashmaps for cone patterns

Helped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.

As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:

/$folder/
!/$folder/*/

This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:

    core.sparseCheckout=false: 0.21 s (0.00 s)
    core.sparseCheckout=true : 3.75 s (3.50 s)
core.sparseCheckoutCone=true : 0.23 s (0.01 s)

The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces.

While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.

And:

sparse-checkout: respect core.ignoreCase in cone mode

Signed-off-by: Derrick Stolee

When a user uses the sparse-checkout feature in cone mode, they add patterns using "git sparse-checkout set <dir1> <dir2> ..." or by using "--stdin" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input using a case-insensitive match.
Do the same for the sparse-checkout feature.

Perform case-insensitive checks while updating the skip-worktree bits during unpack_trees(). This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.

When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:

git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or enabled.
For the repo with a deep history, the numbers were

core.ignoreCase=false: 62s
core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel repository had these numbers:

core.ignoreCase=false: 3.1s
core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "-r" from the ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.

Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.


With Git 2.25 (Q1 2020), "git sparse-checkout list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.

See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit c20d4fd, 06 Jan 2020)

sparse-checkout: list directories in cone mode

Signed-off-by: Derrick Stolee

When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.

In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of 'git sparse-checkout list' to only show the directories that created the patterns.

With this change, the following piped commands would not change the working directory:

git sparse-checkout list | git sparse-checkout set --stdin

The only time this would not work is if core.sparseCheckoutCone is true, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.


The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).

See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster -- in commit 043426c, 30 Jan 2020)

unpack-trees: correctly compute result count

Reported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee

The clear_ce_flags_dir() method processes the cache entries within a common directory. The returned int is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.

eb42feca ("unpack-trees: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by calling clear_ce_flags_1(), but the new mechanism calculated the number of rows by subtracting "cache_end" from "cache" to find the size of the range.
However, the equation is wrong because it divides by sizeof(struct cache_entry *). This is not how pointer arithmetic works!

A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:

Pointer differences, such as `cache_end` - cache, are automatically 
scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *). 
Most likely, the division by sizeof(struct `cache_entry` *) is extraneous 
and should be eliminated.

This warning is correct.

This leaves us with the question "how did this even work?"

The problem that occurs with this incorrect pointer arithmetic is a performance-only bug, and a very slight one at that.
Since the entry count returned by clear_ce_flags_dir() is reduced by a factor of 8, the loop in clear_ce_flags_1() will re-process entries from those directories.

By inserting global counters into unpack-tree.c and tracing them with trace2_data_intmax() (in a private change, for testing), I was able to see count how many times the loop inside clear_ce_flags_1() processed an entry and how many times clear_ce_flags_dir() was called.
Each of these are reduced by at least a factor of 8 with the current change.
A factor larger than 8 happens when multiple levels of directories are repeated.

Specifically, in the Linux kernel repo, the command

git sparse-checkout set LICENSES

restricts the working directory to only the files at root and in the LICENSES directory.
Here are the measured counts:

clear_ce_flags_1 loop blocks:

Before: 11,520
After:   1,621

clear_ce_flags_dir calls:

Before: 7,048
After:    606

While these are dramatic counts, the time spent in clear_ce_flags_1() is under one millisecond in each case, so the improvement is not measurable as an end-to-end time.


With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.

See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 433b8aa, 14 Feb 2020)

sparse-checkout: fix cone mode behavior mismatch

Reported-by: Finn Bryant
Signed-off-by: Derrick Stolee

The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.

When a file path is given to "git sparse-checkout set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting the MATCHED_RECURSIVE response, and hence these were left out of the matched cone.

Fix this bug by checking for MATCHED_RECURSIVE in addition to MATCHED and add a test that prevents regression.

The documentation now includes:

When core.sparseCheckoutCone is enabled, the input list is considered a list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output of git ls-tree --name-only. This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings.


With Git 2.26 (Q1 2020), "git sparse-checkout" learned a new "add" subcommand.

See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4d7dfc, 05 Mar 2020)

sparse-checkout: create 'add' subcommand

Signed-off-by: Derrick Stolee

When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.

This is not much different from the 'set' subcommand, because we still want to allow the '--stdin' option and interpret inputs as directories when in cone mode and patterns otherwise.

When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directory A when A/B is already a directory in the cone. Test the different cases: siblings, parents, ancestors.

When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.

And:

sparse-checkout: work with Windows paths

Signed-off-by: Derrick Stolee

When using Windows, a user may run 'git sparse-checkout set ABC' to add the Unix-style path A/B/C` to their sparse-checkout patterns.

Normalizing the input path converts the backslashes to slashes before we add the string 'A/B/C' to the recursive hashset.


The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.

With Git 2.27 (Q2 2020), this limitation has been lifted.

See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit e9acbd6, 08 May 2020)

sparse-checkout: stop blocking empty workdirs

Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee

Remove the error condition when updating the sparse-checkout leaves an empty working directory.

This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).

The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[unpack-trees.c](https://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).

With the recent "cone mode" and "git sparse-checkout init [--cone]" command, it is common to set a reasonable sparse-checkout pattern set of

/*
!/*/

which matches only files at root. If the repository has no such files, then their "git sparse-checkout init" command will fail.

Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable" or "git sparse-checkout set". This is especially simple when using cone mode.

这篇关于带排除的 Git 稀疏结帐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆