带排除的 Git 稀疏结帐 [英] Git sparse checkout with exclusion
问题描述
根据这个线程,应该实现 Git 的 sparse-checkout
功能 中的排除.是吗?
假设我有以下结构:
论文/文件/...演示/演示文稿/heavy_presentation演示文稿/...
现在我想从结帐中排除 presentations/heavy_presentation
,而将其余部分留在结帐中.我还没有设法让它运行.什么是正确的语法?
在 Git 2.25(2020 年第一季度)中,稀疏检出工作树的管理获得了专用的 "sparse-checkout
"命令.
首先,这是一个扩展示例,从使用 --filter
选项的快速克隆开始:
git clone --filter=blob:none --no-checkout https://github.com/git/gitcd gitgit sparse-checkout init --cone# 设置 git config core.sparseCheckoutCone truegit read-tree -mu HEAD
使用锥选项(详细/记录如下)意味着您的 .gitinfosparse-checkout
将包含以下开头的模式:
/*!/*/
含义:只有顶级文件,没有子文件夹.
如果你不想要顶级文件,你需要避免锥形模式:
# 在 .git/config.worktree 中禁用锥形模式git 配置 core.sparseCheckoutCone 假# 删除 .gitinfosparse-checkoutgit sparse-checkout 禁用# 添加预期的模式,只包含一个没有顶级文件的子文件夹:git 稀疏结帐集/mySubFolder/# 只用正确的文件填充工作树:git read-tree -mu HEAD
详情:
(请参阅使用 sparse-checkout
" 将您的 monorepo 缩小到规模德里克·斯托利)
因此,不仅排除子文件夹有效,而且使用cone"会更快稀疏结帐模式(使用 Git 2.25).
请参阅 commit 761e3d2(2019 年 12 月 20 日),作者 Ed Maste (emaste
).
参见 commit 190a65f(2019 年 12 月 13 日)和 提交 cff4e91, 提交 416adc8,69a href="https://github.com/git/git/commit/f75a69f88099689"="https://github.com/git/git/commit/fb10ca5b54362e6f860e1e9049e03924fcf5f05b" rel="nofollow noreferrer">提交 fb10ca5, 提交99dfa6f ,提交e091228 一>, 提交e9de487,noreferd="cdderel4f">noreferrd="cdde487", 提交 eb42fec, 提交 af09ce2, 提交 879321e、commit 72918c1, commit 7bffca9, commit f6039a9, 提交 d89f09c, commit 94c0956(2019 年 11 月 21 日)来自 Derrick Stolee (derrickstolee
).
请参阅 commit e6152e3(2019 年 11 月 21 日),作者 Jeff Hostetler (Jeff-Hostetler
).
(由 Junio C Hamano 合并 -- gitster
-- 在
sparse-checkout
:添加'cone'模式
签字人:Derrick Stolee
<块引用>
随着索引中模式数量和条目数量的增加,稀疏结账功能可以具有二次性能.
如果有 1,000 个模式和 1,000,000 个条目,这个时间可能非常重要.
创建一个新的布尔配置选项 core.sparseCheckoutCone,以表明我们希望稀疏结帐文件包含一组更有限的模式.
这是一个独立于 core.sparseCheckout
的配置设置,以避免通过引入三态选项来破坏旧客户端.
config代码> 手册页
包括:
`core.sparseCheckoutCone`:
<块引用>
启用圆锥模式";稀疏结帐功能.
当 sparse-checkout 文件包含一组有限的模式时,这种模式可提供显着的性能优势.
稀疏结帐
手册页 详细信息:
锥形图案组
<块引用>
完整的模式集允许任意模式匹配和复杂的包含/排除规则.
在更新索引时,这些可能导致 O(N*M)
模式匹配,其中 N
是模式的数量,M
是数量索引中的路径.为了解决这个性能问题,启用 core.spareCheckoutCone
时允许使用更受限制的模式集.
锥体模式集中接受的模式是:
- 递归:包含目录中的所有路径.
- 父级:包含目录中的所有文件.
除了以上两种模式,我们还期望根目录下的所有文件都包含在内.如果添加了递归模式,则将所有前导目录添加为父模式.
默认情况下,当运行 git sparse-checkout init
时,根目录被添加为父模式.此时,sparse-checkout 文件包含以下模式:
/*!/*/
这表示包括根目录中的所有内容,但不包含根目录以下两个级别的内容."
如果我们然后添加文件夹A/B/C
作为递归模式,文件夹A
和A/B
被添加为父模式.
生成的稀疏结帐文件现在是
/*!/*//一种/!/一种/*//A/B/!/A/B/*//A/B/C/
在这里,顺序很重要,因此消极模式被积极模式覆盖出现在文件下方的模式.
如果 core.sparseCheckoutCone=true
,那么 Git 将解析稀疏结帐文件,期望这些类型的模式.
如果模式不匹配,Git 会发出警告.
如果模式确实与预期的格式匹配,那么 Git 将使用更快的哈希-基于算法来计算 sparse-checkout
中的包含.
所以:
<块引用>sparse-checkout
:初始化并设置锥形模式
帮助:Eric Wong
帮助:约翰内斯·辛德林
签字人:Derrick Stolee
<块引用>
为了使锥形模式集易于使用,更新'git sparse-checkout (init|set)
'.
将 '--cone
' 标志添加到 'git sparse-checkout init
' 设置配置选项'core.sparseCheckoutCone=true
'.
运行时'git sparse-checkout set
' 在锥形模式下,用户只需要提供递归文件夹匹配列表.Git 会自动为前导目录添加必要的父匹配项.
请注意,--cone
选项仅在 Git 2.26(2020 年第一季度)中记录
(由 Junio C Hamano 合并 -- gitster
-- 在
doc
:sparse-checkout
:提及--cone
选项
签字人:Matheus Tavares
确认:Derrick Stolee
<块引用>
在 af09ce2 ("sparse-checkout
:初始化并设置为锥形模式",2019 年 11 月 21 日,Git v2.25.0-rc0 -- git sparse-checkout
init'.
包括:
<块引用>当提供 --cone
时,core.sparseCheckoutCone
设置也被设置,允许使用有限的模式集获得更好的性能.
(上面介绍的模式集",在本答案的CONE PATTERN SET
"部分)
这个新的锥体"有多快?模式是?
<块引用>sparse-checkout
:对锥体使用哈希图模式
帮助:Eric Wong
帮助:约翰内斯·辛德林
签字人:Derrick Stolee
<块引用>
锥模式"允许的父模式和递归模式sparse-checkout 中的选项有足够的限制,我们可以避免使用正则表达式解析.一切都基于前缀匹配,因此我们可以使用哈希集来存储稀疏结帐文件中的前缀.在检查路径时,我们可以从路径中去除路径条目并检查哈希集是否完全匹配.
作为测试,我为 Linux 存储库创建了一个锥形模式稀疏结帐文件,该文件实际上包含每个文件.这是通过获取 Linux 存储库中的每个文件夹并在此处创建模式对来构建的:
/$folder/!/$文件夹/*/
<块引用>
这导致了一个包含 8,296 个模式的稀疏结帐文件.
在此文件上运行 'git read-tree -mu HEAD' 具有以下性能:
core.sparseCheckout=false: 0.21 s (0.00 s)core.sparseCheckout=true:3.75 秒(3.50 秒)core.sparseCheckoutCone=true:0.23 秒(0.01 秒)
根据trace2
性能跟踪,上面括号中的时间对应于第一次clear_ce_flags()
调用所花费的时间.
虽然这个例子是人为的,但它展示了这些模式如何减慢稀疏结账功能.
还有:
<块引用>sparse-checkout
:尊重core.ignoreCase锥形模式
签字人:Derrick Stolee
<块引用>
当用户在锥形模式下使用稀疏结帐功能时,他们会使用git sparse-checkout set
"或使用--stdin
";在标准输入上逐行提供目录.
这种行为自然看起来很像用户键入git add <;目录1><目录2>...
"
如果启用了 core.ignoreCase
,则git add
";将使用不区分大小写的匹配来匹配输入.
对 sparse-checkout
功能执行相同的操作.
在unpack_trees()
期间更新跳过工作树位时执行不区分大小写的检查.这是通过将哈希算法和哈希图比较方法更改为可选地使用不区分大小写的方法来实现的.
启用此功能后,散列算法的性能开销很小.
为了找出最坏的情况,以下是在具有深层目录结构的存储库上运行的:
git ls-tree -d -r --name-only HEAD |git sparse-checkout set --stdin
'set' 命令在 core.ignoreCase
禁用或启用的情况下计时.
对于历史悠久的回购,数字是
core.ignoreCase=false: 62score.ignoreCase=true: 74s (+19.3%)
为了可重复性,Linux 内核存储库上的等效测试具有以下数字:
core.ignoreCase=false: 3.1score.ignoreCase=true: 3.6s (+16%)
现在,这不是一个完全公平的比较,因为大多数用户会使用更浅的目录来定义他们的稀疏锥,并且性能改进来自 eb42feca97(解包树:锥形模式下的散列更少"2019-11-21,Git 2 可以删除散列的大部分成本.大多数散列成本为 25-rc0)要进行更真实的测试,请删除-r
";来自 ls-tree
命令只存储第一级目录.
在这种情况下,Linux 内核存储库在每种情况下需要 0.2-0.25 秒,深度存储库在每种情况下需要 1 秒,正负 0.05 秒.
因此,我们可以证明此更改的成本,但这对任何合理的稀疏结帐锥都不太可能重要.
使用 Git 2.25(2020 年第一季度),git sparse-checkout
列表"当cone"出现时,子命令学会了以更简洁的形式给出其输出.模式有效.
参见 commit 4fd683b、commit de11951(2019 年 12 月 30 日)由 德里克·斯托利 (derrickstolee
).
(由 Junio C Hamano 合并 -- gitster
-- 在
sparse-checkout
:以锥体形式列出目录模式
签字人:Derrick Stolee
<块引用>
当 core.sparseCheckoutCone
启用时,'git sparse-checkout set
' 命令将目录列表作为输入,然后创建稀疏结帐模式的有序列表,以便递归包含这些目录以及父目录中的所有同级条目也包括在内.
列出模式比目录本身更不友好.
在锥形模式下,只要模式与预期的锥形模式类型匹配,就更改 'git sparse-checkout list
' 只显示创建模式的目录.
通过此更改,以下管道命令不会更改工作目录:
git sparse-checkout list |git sparse-checkout set --stdin
唯一不起作用的情况是 core.sparseCheckoutCone
为 true
,但稀疏结帐文件包含的模式与锥体的预期模式类型不匹配模式.
最近在此版本中添加的代码在稀疏锥模式下移动到索引中同一目录中的条目之外的条目没有计算错误跳过的条目数,已更正,使用Git2.25.1(2020 年 2 月).
请参阅 commit 7210ca4(2020 年 1 月 27 日 https="https://a href="https://github.com/a href="https://github.com/git/2020/01/2020)://github.com/gitster" rel="nofollow noreferrer">Junio C Hamano (gitster
).
请参阅 commit 4c6c797(2020 年 1 月 10 日)https://Derrick Stolee 通过 GitGitGadget (``).
(由 Junio C Hamano 合并 -- gitster
-- 在
unpack-trees
:正确计算结果计数
报告人:约翰内斯·辛德林
签字人:Derrick Stolee
<块引用>
clear_ce_flags_dir()
方法处理公共目录中的缓存条目.返回的 int
是该目录处理的缓存条目数.
在锥形模式下使用稀疏结帐功能时,我们可以跳过完全包含或完全排除目录中的条目的模式匹配.
eb42feca(解压树
:锥形模式下的哈希更少",2019 年 11 月 21 日,Git v2.25.0-rc0 -- 列在 batch #0) 中引入了此性能功能.旧机制依赖于调用 clear_ce_flags_1()
返回的计数,但新机制通过减去cache_end
"来计算行数.来自缓存
"找到范围的大小.
但是,该等式是错误的,因为它除以 sizeof(struct cache_entry *)
.这不是指针算法的工作原理!
为 2.25.0 版本准备的 Git for Windows 覆盖版本发现此问题并显示警告:
指针差异,如`cache_end` - 缓存,自动按指向类型(struct `cache_entry` *)的大小(8 字节)缩小.很可能,除以 sizeof(struct `cache_entry` *) 是无关紧要的并且应该被淘汰.
这个警告是正确的.
这给我们留下了一个问题:这是怎么做到的?"
这种不正确的指针算法出现的问题是一个仅性能错误,而且是一个非常轻微的错误.
由于 clear_ce_flags_dir()
返回的条目数减少了 8 倍,clear_ce_flags_1()
中的循环将重新处理这些目录中的条目.
通过将全局计数器插入 unpack-tree.c
并使用 trace2_data_intmax()
跟踪它们(在私人更改中,用于测试),我能够看到计数clear_ce_flags_1()
内部的循环处理一个条目的次数以及调用 clear_ce_flags_dir()
的次数.
随着当前的变化,这些中的每一个都至少减少了 8 倍.
当多级目录重复时,会出现大于 8 的因子.
具体来说,在 Linux 内核 repo 中,命令
git sparse-checkout 设置许可证
将工作目录限制为仅根目录和 LICENSES 目录中的文件.
以下是测量的计数:
clear_ce_flags_1
循环块:
之前:11,520之后:1,621
clear_ce_flags_dir
调用:
之前:7,048之后:606
虽然这些数字很惊人,但在每种情况下 clear_ce_flags_1()
花费的时间都在 1 毫秒以下,因此无法用端到端时间来衡量改进.
在 Git 2.26(2020 年第 1 季度)中,稀疏结帐功能中的一些粗糙边缘,尤其是在锥形模式周围,已得到清理.
参见commit f998a3f、commit d2e65f4, commit d2e65f4">提交 e53ffe2、提交 e55682e、提交bd64de4 ,提交 d585f0e,commit 9abc60f(2020 年 1 月 31 日)和 norel3ref="https://github.com/git/git/commit/9e6d3e64175713bc0007f3012提交 9e6d3e6、提交 41de0c6、commit 47dbf10, 提交"nofollow noreferrer">提交 3c75406、2、2、提交3a6href="https://github.com/git/git/commit/522e6417487cc5c3f2f6d49c8f63554af63d8eda" rel="nofollow noreferrer">commit 522e641(2020 年 1 月 24 日)来自 Derrick Stolee (derrickstolee
).
请参阅 commit 7aa9ef2(2020 年 1 月 24 日)杰夫·金 (peff
).
(由 Junio C Hamano 合并 -- gitster
-- 在
sparse-checkout
:修复锥模式行为不匹配
报告人:芬恩·布莱恩特
签字人:Derrick Stolee
<块引用>
特殊锥体模式"的用意在稀疏结帐功能中,始终匹配与禁用锥形模式时相同的稀疏结帐文件匹配的相同模式.
当文件路径被赋予git sparse-checkout
设置"在锥形模式下,锥形模式不正确地将文件匹配为递归路径.
在设置跳过工作树位时,文件不期望 MATCHED_RECURSIVE
响应,因此这些被排除在匹配的锥体之外.
通过检查 MATCHED_RECURSIVE
和 MATCHED
并添加防止回归的测试来修复此错误.
文档 现在包括:
<块引用>当启用 core.sparseCheckoutCone
时,输入列表被认为是一个目录列表而不是稀疏结帐模式.
该命令将模式写入 sparse-checkout 文件,以包括这些目录中包含的所有文件(递归)以及作为祖先目录兄弟的文件.
输入格式与 git ls-tree --name-only
的输出匹配.这包括将以双引号 ("
) 开头的路径名解释为 C 风格的带引号的字符串.
在 Git 2.26(2020 年第一季度)中,git sparse-checkout
"学习了一个新的add
";子命令.
参见 commit 6c11c6a(2020 年 2 月 20 日),以及 commit ef07659, 提交" rel="nofollow noreferrer">提交 2631dc8, 04,c04bcommit 6fb705a(2020 年 2 月 11 日)来自 Derrick Stolee (derrickstolee
).
(由 Junio C Hamano 合并 -- gitster
-- 在
sparse-checkout
:创建添加"子命令
签字人:Derrick Stolee
<块引用>
使用稀疏结账功能时,用户可能希望逐步增加他们的稀疏结账模式集.
允许使用新的添加"子命令添加模式.
这与 'set' 子命令没有太大区别,因为我们仍然希望允许 '--stdin
' 选项并将输入解释为在锥形模式下时的目录,否则解释为模式.>
在锥体模式下,我们正在增大锥体.
当 A/B
已经是锥体中的一个目录时,这实际上可能会减少添加目录 A
时的模式集.测试不同的情况:兄弟姐妹、父母、祖先.
当不在锥形模式下时,我们只能假设模式应该附加到稀疏结账文件中.
还有:
<块引用>sparse-checkout
:使用 Windows 路径
签字人:Derrick Stolee
<块引用>
使用 Windows 时,用户可以运行 'git sparse-checkout
设置 ABC' 以将 Unix 风格的路径
A/B/C` 添加到它们的稀疏结帐模式中.
在我们将字符串A/B/C
"添加到递归哈希集之前,对输入路径进行规范化会将反斜杠转换为斜杠.
很长时间以来,稀疏结帐模式已被禁止排除所有路径,留下空的工作树.
在 Git 2.27(2020 年第二季度)中,此限制已被解除.
参见 commit ace224a(2020 年 5 月 4 日)Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e9acbd6, 08 May 2020)
sparse-checkout
: stop blocking empty workdirs
Reported-by: Lars Schneider
Signed-off-by: Derrick Stolee
<块引用>
Remove the error condition when updating the sparse-checkout leaves an empty working directory.
This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).
The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("[
unpack-trees.c](https
://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).
With the recent "cone mode" and "git sparse-checkout init [--cone]
" command, it is common to set a reasonable sparse-checkout pattern set of
/*!/*/
which matches only files at root. If the repository has no such files, then their "git sparse-checkout init
" command will fail.
Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable
" or "git sparse-checkout set
". This is especially simple when using cone mode.
According to this thread, exclusion in Git's sparse-checkout
feature is supposed to be implemented. Is it?
Assume that I have the following structure:
papers/
papers/...
presentations/
presentations/heavy_presentation
presentations/...
Now I want to exclude presentations/heavy_presentation
from the checkout, while leaving the rest in the checkout. I haven't managed to get this running. What's the right syntax for this?
With Git 2.25 (Q1 2020), Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout
" command.
First, here is an extended example, starting with a fast clone using a --filter
option:
git clone --filter=blob:none --no-checkout https://github.com/git/git
cd git
git sparse-checkout init --cone
# that sets git config core.sparseCheckoutCone true
git read-tree -mu HEAD
Using the cone option (detailed/documented below) means your .gitinfosparse-checkout
will include patterns starting with:
/*
!/*/
Meaning: only top files, no subfolder.
If you do not want top file, you need to avoid the cone mode:
# Disablecone mode in .git/config.worktree
git config core.sparseCheckoutCone false
# remove .gitinfosparse-checkout
git sparse-checkout disable
# Add the expected pattern, to include just a subfolder without top files:
git sparse-checkout set /mySubFolder/
# populate working-tree with only the right files:
git read-tree -mu HEAD
In details:
(See more at "Bring your monorepo down to size with sparse-checkout
" from
Derrick Stolee)
So not only excluding a subfolder does work, but it will work faster with the "cone" mode of a sparse checkout (with Git 2.25).
See commit 761e3d2 (20 Dec 2019) by Ed Maste (emaste
).
See commit 190a65f (13 Dec 2019), and commit cff4e91, commit 416adc8, commit f75a69f, commit fb10ca5, commit 99dfa6f, commit e091228, commit e9de487, commit 4dcd4de, commit eb42fec, commit af09ce2, commit 96cc8ab, commit 879321e, commit 72918c1, commit 7bffca9, commit f6039a9, commit d89f09c, commit bab3c35, commit 94c0956 (21 Nov 2019) by Derrick Stolee (derrickstolee
).
See commit e6152e3 (21 Nov 2019) by Jeff Hostetler (Jeff-Hostetler
).
(Merged by Junio C Hamano -- gitster
-- in commit bd72a08, 25 Dec 2019)
sparse-checkout
: add 'cone' modeSigned-off-by: Derrick Stolee
The sparse-checkout feature can have quadratic performance as the number of patterns and number of entries in the index grow.
If there are 1,000 patterns and 1,000,000 entries, this time can be very significant.Create a new Boolean config option, core.sparseCheckoutCone, to indicate that we expect the sparse-checkout file to contain a more limited set of patterns.
This is a separate config setting fromcore.sparseCheckout
to avoid breaking older clients by introducing a tri-state option.
The config
man page includes:
`core.sparseCheckoutCone`:
Enables the "cone mode" of the sparse checkout feature.
When the sparse-checkout file contains a limited set of patterns, then this mode provides significant performance advantages.
The git sparse-checkout
man page details:
CONE PATTERN SET
The full pattern set allows for arbitrary pattern matches and complicated inclusion/exclusion rules.
These can result inO(N*M)
pattern matches when updating the index, whereN
is the number of patterns andM
is the number of paths in the index. To combat this performance issue, a more restricted pattern set is allowed whencore.spareCheckoutCone
is enabled.The accepted patterns in the cone pattern set are:
- Recursive: All paths inside a directory are included.
- Parent: All files immediately inside a directory are included.
In addition to the above two patterns, we also expect that all files in the root directory are included. If a recursive pattern is added, then all leading directories are added as parent patterns.
By default, when running
git sparse-checkout init
, the root directory is added as a parent pattern. At this point, the sparse-checkout file contains the following patterns:/* !/*/
This says "include everything in root, but nothing two levels below root."
If we then add the folderA/B/C
as a recursive pattern, the foldersA
andA/B
are added as parent patterns.
The resulting sparse-checkout file is now/* !/*/ /A/ !/A/*/ /A/B/ !/A/B/*/ /A/B/C/
Here, order matters, so the negative patterns are overridden by the positive patterns that appear lower in the file.
If
core.sparseCheckoutCone=true
, then Git will parse the sparse-checkout file expecting patterns of these types.
Git will warn if the patterns do not match.
If the patterns do match the expected format, then Git will use faster hash- based algorithms to compute inclusion in thesparse-checkout
.
So:
sparse-checkout
: init and set in cone modeHelped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
To make the cone pattern set easy to use, update the behavior of '
git sparse-checkout (init|set)
'.Add '
--cone
' flag to 'git sparse-checkout init
' to set the config option 'core.sparseCheckoutCone=true
'.When running '
git sparse-checkout set
' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories.
Note, the --cone
option is only documented in Git 2.26 (Q1 2020)
(Merged by Junio C Hamano -- gitster
-- in commit ea46d90, 05 Feb 2020)
doc
:sparse-checkout
: mention--cone
optionSigned-off-by: Matheus Tavares
Acked-by: Derrick Stolee
In af09ce2 ("
sparse-checkout
: init and set in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge), the '--cone
' option was added to 'git sparse-checkout
init'.Document it in
git sparse-checkout
:
That includes:
When
--cone
is provided, thecore.sparseCheckoutCone
setting is also set, allowing for better performance with a limited set of patterns.
("set of patterns" presented above, in the "CONE PATTERN SET
" section of this answer)
How much faster this new "cone" mode would be?
sparse-checkout
: use hashmaps for cone patternsHelped-by: Eric Wong
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match.
As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here:
/$folder/ !/$folder/*/
This resulted in a sparse-checkout file sith 8,296 patterns.
Running 'git read-tree -mu HEAD' on this file had the following performance:core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true : 3.75 s (3.50 s) core.sparseCheckoutCone=true : 0.23 s (0.01 s)
The times in parentheses above correspond to the time spent in the first
clear_ce_flags()
call, according to thetrace2
performance traces.While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature.
And:
sparse-checkout
: respect core.ignoreCase in cone modeSigned-off-by: Derrick Stolee
When a user uses the sparse-checkout feature in cone mode, they add patterns using "
git sparse-checkout set <dir1> <dir2> ...
" or by using "--stdin
" to provide the directories line-by-line over stdin.
This behaviour naturally looks a lot like the way a user would type "git add <dir1> <dir2> ...
"If
core.ignoreCase
is enabled, then "git add
" will match the input using a case-insensitive match.
Do the same for thesparse-checkout
feature.Perform case-insensitive checks while updating the skip-worktree bits during
unpack_trees()
. This is done by changing the hash algorithm and hashmap comparison methods to optionally use case- insensitive methods.When this is enabled, there is a small performance cost in the hashing algorithm.
To tease out the worst possible case, the following was run on a repo with a deep directory structure:git ls-tree -d -r --name-only HEAD | git sparse-checkout set --stdin
The 'set' command was timed with
core.ignoreCase
disabled or enabled.
For the repo with a deep history, the numbers werecore.ignoreCase=false: 62s core.ignoreCase=true: 74s (+19.3%)
For reproducibility, the equivalent test on the Linux kernel repository had these numbers:
core.ignoreCase=false: 3.1s core.ignoreCase=true: 3.6s (+16%)
Now, this is not an entirely fair comparison, as most users will define their sparse cone using more shallow directories, and the performance improvement from eb42feca97 ("unpack-trees: hash less in cone mode" 2019-11-21, Git 2.25-rc0) can remove most of the hash cost. For a more realistic test, drop the "
-r
" from thels-tree
command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in each case, and the deep repository takes one second, plus or minus 0.05s, in each case.Thus, we can demonstrate a cost to this change, but it is unlikely to matter to any reasonable sparse-checkout cone.
With Git 2.25 (Q1 2020), "git sparse-checkout
list" subcommand learned to give its output in a more concise form when the "cone" mode is in effect.
See commit 4fd683b, commit de11951 (30 Dec 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit c20d4fd, 06 Jan 2020)
sparse-checkout
: list directories in cone modeSigned-off-by: Derrick Stolee
When
core.sparseCheckoutCone
is enabled, the 'git sparse-checkout set
' command takes a list of directories as input, then creates an ordered list of sparse-checkout patterns such that those directories are recursively included and all sibling entries along the parent directories are also included.
Listing the patterns is less user-friendly than the directories themselves.In cone mode, and as long as the patterns match the expected cone-mode pattern types, change the output of '
git sparse-checkout list
' to only show the directories that created the patterns.With this change, the following piped commands would not change the working directory:
git sparse-checkout list | git sparse-checkout set --stdin
The only time this would not work is if
core.sparseCheckoutCone
istrue
, but the sparse-checkout file contains patterns that do not match the expected pattern types for cone mode.
The code recently added in this release to move to the entry beyond the ones in the same directory in the index in the sparse-cone mode did not count the number of entries to skip over incorrectly, which has been corrected, with Git 2.25.1 (Feb. 2020).
See commit 7210ca4 (27 Jan 2020) by Junio C Hamano (gitster
).
See commit 4c6c797 (10 Jan 2020) by Derrick Stolee via GitGitGadget (``).
(Merged by Junio C Hamano -- gitster
-- in commit 043426c, 30 Jan 2020)
unpack-trees
: correctly compute result countReported-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
The
clear_ce_flags_dir()
method processes the cache entries within a common directory. The returnedint
is the number of cache entries processed by that directory.
When using the sparse-checkout feature in cone mode, we can skip the pattern matching for entries in the directories that are entirely included or entirely excluded.eb42feca ("
unpack-trees
: hash less in cone mode", 2019-11-21, Git v2.25.0-rc0 -- merge listed in batch #0) introduced this performance feature. The old mechanism relied on the counts returned by callingclear_ce_flags_1()
, but the new mechanism calculated the number of rows by subtracting "cache_end
" from "cache
" to find the size of the range.
However, the equation is wrong because it divides bysizeof(struct cache_entry *)
. This is not how pointer arithmetic works!A coverity build of Git for Windows in preparation for the 2.25.0 release found this issue with the warning:
Pointer differences, such as `cache_end` - cache, are automatically scaled down by the size (8 bytes) of the pointed-to type (struct `cache_entry` *). Most likely, the division by sizeof(struct `cache_entry` *) is extraneous and should be eliminated.
This warning is correct.
This leaves us with the question "how did this even work?"
The problem that occurs with this incorrect pointer arithmetic is a performance-only bug, and a very slight one at that.
Since the entry count returned byclear_ce_flags_dir()
is reduced by a factor of 8, the loop inclear_ce_flags_1()
will re-process entries from those directories.By inserting global counters into
unpack-tree.c
and tracing them withtrace2_data_intmax()
(in a private change, for testing), I was able to see count how many times the loop insideclear_ce_flags_1()
processed an entry and how many timesclear_ce_flags_dir()
was called.
Each of these are reduced by at least a factor of 8 with the current change.
A factor larger than 8 happens when multiple levels of directories are repeated.Specifically, in the Linux kernel repo, the command
git sparse-checkout set LICENSES
restricts the working directory to only the files at root and in the LICENSES directory.
Here are the measured counts:
clear_ce_flags_1
loop blocks:Before: 11,520 After: 1,621
clear_ce_flags_dir
calls:Before: 7,048 After: 606
While these are dramatic counts, the time spent in
clear_ce_flags_1()
is under one millisecond in each case, so the improvement is not measurable as an end-to-end time.
With Git 2.26 (Q1 2020), some rough edges in the sparse-checkout feature, especially around the cone mode, have been cleaned up.
See commit f998a3f, commit d2e65f4, commit e53ffe2, commit e55682e, commit bd64de4, commit d585f0e, commit 4f52c2c, commit 9abc60f (31 Jan 2020), and commit 9e6d3e6, commit 41de0c6, commit 47dbf10, commit 3c75406, commit d622c34, commit 522e641 (24 Jan 2020) by Derrick Stolee (derrickstolee
).
See commit 7aa9ef2 (24 Jan 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 433b8aa, 14 Feb 2020)
sparse-checkout
: fix cone mode behavior mismatchReported-by: Finn Bryant
Signed-off-by: Derrick Stolee
The intention of the special "cone mode" in the sparse-checkout feature is to always match the same patterns that are matched by the same sparse-checkout file as when cone mode is disabled.
When a file path is given to "
git sparse-checkout
set" in cone mode, then the cone mode improperly matches the file as a recursive path.
When setting the skip-worktree bits, files were not expecting theMATCHED_RECURSIVE
response, and hence these were left out of the matched cone.Fix this bug by checking for
MATCHED_RECURSIVE
in addition toMATCHED
and add a test that prevents regression.
The documentation now includes:
When
core.sparseCheckoutCone
is enabled, the input list is considered a list of directories instead of sparse-checkout patterns.
The command writes patterns to the sparse-checkout file to include all files contained in those directories (recursively) as well as files that are siblings of ancestor directories.
The input format matches the output ofgit ls-tree --name-only
. This includes interpreting pathnames that begin with a double quote ("
) as C-style quoted strings.
With Git 2.26 (Q1 2020), "git sparse-checkout
" learned a new "add
" subcommand.
See commit 6c11c6a (20 Feb 2020), and commit ef07659, commit 2631dc8, commit 4bf0c06, commit 6fb705a (11 Feb 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit f4d7dfc, 05 Mar 2020)
sparse-checkout
: create 'add' subcommandSigned-off-by: Derrick Stolee
When using the sparse-checkout feature, a user may want to incrementally grow their sparse-checkout pattern set.
Allow adding patterns using a new 'add' subcommand.This is not much different from the 'set' subcommand, because we still want to allow the '
--stdin
' option and interpret inputs as directories when in cone mode and patterns otherwise.When in cone mode, we are growing the cone.
This may actually reduce the set of patterns when adding directoryA
whenA/B
is already a directory in the cone. Test the different cases: siblings, parents, ancestors.When not in cone mode, we can only assume the patterns should be appended to the sparse-checkout file.
And:
sparse-checkout
: work with Windows pathsSigned-off-by: Derrick Stolee
When using Windows, a user may run '
git sparse-checkout
set ABC' to add the Unix-style path
A/B/C` to their sparse-checkout patterns.Normalizing the input path converts the backslashes to slashes before we add the string '
A/B/C
' to the recursive hashset.
The sparse-checkout patterns have been forbidden from excluding all paths, leaving an empty working tree, for a long time.
With Git 2.27 (Q2 2020), this limitation has been lifted.
See commit ace224a (04 May 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e9acbd6, 08 May 2020)
sparse-checkout
: stop blocking empty workdirsReported-by: Lars Schneider
Signed-off-by: Derrick Stolee
Remove the error condition when updating the sparse-checkout leaves an empty working directory.
This behavior was added in 9e1afb167 ("sparse checkout: inhibit empty worktree", 2009-08-20, Git v1.7.0-rc0 -- merge).
The comment was added in a7bc906f2 ("Add explanation why we do not allow to sparse checkout to empty working tree", 2011-09-22, Git v1.7.8-rc0 -- merge) in response to a "dubious" comment in 84563a624 ("
[
unpack-trees.c](https
://github.com/git/git/blob/ace224ac5fb120e9cae894e31713ab60e91f141f/unpack-trees.c): cosmetic fix", 2010-12-22, Git v1.7.5-rc0 -- merge).With the recent "cone mode" and "
git sparse-checkout init [--cone]
" command, it is common to set a reasonable sparse-checkout pattern set of/* !/*/
which matches only files at root. If the repository has no such files, then their "
git sparse-checkout init
" command will fail.Now that we expect this to be a common pattern, we should not have the commands fail on an empty working directory.
If it is a confusing result, then the user can recover with "git sparse-checkout disable
" or "git sparse-checkout set
". This is especially simple when using cone mode.
这篇关于带排除的 Git 稀疏结帐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!