我怎么知道`git gc --auto`是否做了什么? [英] How can I know if `git gc --auto` has done something?

查看:88
本文介绍了我怎么知道`git gc --auto`是否做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行git gc --auto作为自动保存脚本的一部分.如果git gc --auto做了一些事情,我想进行进一步的清理,但是如果git gc --auto不想做点什么,我想避免麻烦.有没有办法检查git gc --auto的返回值,或者事先检查是否有必要运行它?

解决方案

使用Git 2.30(Q1 2021),"(( man )出现在先前的答案中,

仍在不断发展.

它比git gc更精确,并且如OP中所要求的那样,在2.30版本中引入的选项允许知道它何时完成了某些工作.

请参见提交e841a79 提交52fe41f 提交efdd2f0 提交3e220e6 提交252cfb7 Derrick Stolee(derrickstolee) .
(由 Junio C Hamano-gitster- maintenance :添加增量重新打包任务

签名人:Derrick Stolee

之前的更改使用可以在后台安全运行的松散对象"清除了松散的对象.添加类似的作业,以对打包文件执行类似的清理操作.

运行' git repack 的一个问题> ( man )'是它旨在将所有打包文件重新打包到一个打包文件中.尽管这是存储对象数据的最节省空间的方法,但它并不节省时间或内存.如果仓库如此之大,以至于用户难以在其磁盘上存储软件包的两个副本,这将变得非常重要.

相反,执行增量"操作.通过将一些小的压缩文件收集到一个新的压缩文件中进行重新压缩.从'multi-pack-index:实现'expire'子命令",2019年6月10日,Git v2.23.0-rc0-)和' git multi-pack-index repack ((``midx:实现midx_repack()'',2019-06-10,Git v2.23.0-rc0-).

增量重新包装"任务运行以下步骤:

  1. ' git multi-pack-index write ( man )'如果不存在则创建一个多包索引文件,否则将更新该多包索引 自上次写入以来出现的任何新的打包文件.这 与后台提取作业特别相关.

    当multi-pack-index看到同一对象的两个副本时,它会 将偏移数据存储到较新的打包文件中.这意味着 一些旧的打包文件可能会变为未引用".我将使用 意思是位于 多重包装索引,但多重包装索引中没有任何对象 引用该打包文件中的位置."

  2. ' git multi-pack-index expire ( man )'删除所有未引用的打包文件,并更新multi-pack-index以将这些打包文件从 列表.这是安全的,因为并发的Git进程会看到 multi-pack-index并且在寻找对象时不打开那些包装 内容. (类似于松散对象"工作,有一些Git 不管multi-pack-index如何打开包文件的命令, 但很少使用.此外,自行选择以下内容的用户 使用后台操作可能会避免使用那些 命令.)

  3. ' git multi-pack-index repack --bacth-size=<size> ( man )'收集在multi-pack-index中列出的一组pack文件,并创建 一个新的打包文件,其中包含列出了偏移量的对象 由multi-pack-index包含在那些对象中.套装- 通过修改打包文件,贪婪地选择文件 时间,然后将打包文件添加到集合中(如果其预期大小"),则将打包文件添加到集合中是 小于批次大小,直到 所选的打包文件至少为批处理大小.预期 尺寸"通过将打包文件的大小除以得出 乘以压缩文件中对象的数量并乘以 multi-pack-index中具有偏移量的对象数 打包文件.预期大小大约是从中得出的数据量 pack-file将有助于最终的pack-file大小.这 目的是使生成的压缩文件大小接近 到提供的批次大小.

    下次运行增量重新打包任务将删除这些 在过期"步骤中重新打包文件.

    在此版本中,批量大小设置为"0".忽略了 选择打包文件时的大小限制.相反 选择所有打包文件并将所有打包对象重新打包到一个 单个打包文件.它将在下一次更改中更新,但是 它需要进行一些更好的隔离计算 单独的更改.

这些步骤基于 Scalar(以及Git的VFS)中的类似后台维护步骤.对于Windows OS系统信息库的用户来说,这非常有效. 在为Git存储库使用同一VFS一年后,一些用户拥有成千上万个打包文件,这些打包文件最多可合并250 GB数据.我们注意到一些用户遇到了打开文件描述符限制(部分原因是 af96fe3 ("midx:将包添加到packed_git链接列表中"",2019-04-29,Git v2.22.0-rc1-

通过每天运行一次此包文件维护步骤,这些存储库包含成千上万个跨越200+ GB的包,而下降到数十个跨越30-50 GB的包文件.这一切都没有从系统中删除对象,并且使用了2 GB的恒定批大小来完成.一旦完成了将压缩文件减小到较小尺寸的工作,批处理大小为2 GB,这意味着并非每次运行都会触发重新打包操作,因此后续运行不会使压缩文件过期.这使这些存储库保持干净"状态.状态.

git maintenance现在在其中包含手册页:

incremental-repack

incremental-repack作业重新打包对象目录 使用multi-pack-index功能.为了防止种族 在并发Git命令的情况下,它遵循两个步骤 过程.首先,它调用git multi-pack-index expire进行删除 multi-pack-index文件未引用的pack文件.第二个 调用git multi-pack-index repack选择几个小 打包文件,然后将它们重新打包成更大的文件,然后更新 multi-pack-index条目指的是小包装文件 请参阅新的打包文件.这将准备那些小包装文件 在下次运行git multi-pack-index expire时删除. 小包文件的选择应符合预期 大文件包的大小至少为批处理大小;看到 repack子命令中的--batch-size选项 git multi-pack-index .默认的批处理大小为零, 这是一种特殊情况,试图重新打包所有打包文件 放入一个打包文件中.

并且:

maintenance :添加增量重新包装自动条件

签名人:Derrick Stolee

incremental-repack任务通过删除已被新包替换的包文件,然后将一批小包文件重新打包为较大的包文件来更新multi-pack-index.这种增量重新打包比重写所有对象数据快,但比其他一些维护活动慢.

"maintenance.incremental-repack.auto"配置选项指定在运行该步骤之前应在multi-pack-index之外存在多少个打包文件.
这些打包文件可以由' git fetch ( man )'命令或通过松散-objects任务.
默认值为10.

将该选项设置为零会禁用带有'--auto'选项的任务,而负值会使该任务每次运行.

git config现在包含在手册页:

maintenance.incremental-repack.auto

此整数配置选项控制incremental-repack的频率 任务应作为git maintenance run --auto的一部分运行.如果为零 那么incremental-repack任务将不会与--auto一起运行 选项.负值将强制任务每次运行. 否则,正值表示命令应在以下情况下运行: 不在multi-pack-index中的打包文件数至少是该值 maintenance.incremental-repack.auto.默认值为10.

I'm running git gc --auto as part of an automatic saves script. I'd like to run further cleanup if git gc --auto has done something, but I'd like to spare the hassle if git gc --auto doesn't feel like something need to be done. Is there a way to check the return value of git gc --auto, or to check beforehand if it is necessary to run it ?

解决方案

With Git 2.30 (Q1 2021), "git maintenance"(man) , the extended big brother of "git gc"(man) presented in the previous answer, continues to evolve.

It is more precise than git gc and the options introduced in 2.30 allow to know when it has done something, as asked in the OP.

See commit e841a79, commit a13e3d0, commit 52fe41f, commit efdd2f0, commit 18e449f, commit 3e220e6, commit 252cfb7, commit 28cb5e6 (25 Sep 2020) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 52b8c8c, 27 Oct 2020)

maintenance: add incremental-repack task

Signed-off-by: Derrick Stolee

The previous change cleaned up loose objects using the 'loose-objects' that can be run safely in the background. Add a similar job that performs similar cleanups for pack-files.

One issue with running 'git repack(man) ' is that it is designed to repack all pack-files into a single pack-file. While this is the most space-efficient way to store object data, it is not time or memory efficient. This becomes extremely important if the repo is so large that a user struggles to store two copies of the pack on their disk.

Instead, perform an "incremental" repack by collecting a few small pack-files into a new pack-file. The multi-pack-index facilitates this process ever since 'git multi-pack-index expire(man) ' was added in 19575c7 ("multi-pack-index: implement 'expire' subcommand", 2019-06-10, Git v2.23.0-rc0 -- merge listed in batch #6) and 'git multi-pack-index repack(man) ' was added in ce1e4a1 ("midx: implement midx_repack()", 2019-06-10, Git v2.23.0-rc0 -- merge listed in batch #6).

The 'incremental-repack' task runs the following steps:

  1. 'git multi-pack-index write(man)' creates a multi-pack-index file if one did not exist, and otherwise will update the multi-pack-index with any new pack-files that appeared since the last write. This is particularly relevant with the background fetch job.

    When the multi-pack-index sees two copies of the same object, it stores the offset data into the newer pack-file. This means that some old pack-files could become "unreferenced" which I will use to mean "a pack-file that is in the pack-file list of the multi-pack-index but none of the objects in the multi-pack-index reference a location inside that pack-file."

  2. 'git multi-pack-index expire(man)' deletes any unreferenced pack-files and updates the multi-pack-index to drop those pack-files from the list. This is safe to do as concurrent Git processes will see the multi-pack-index and not open those packs when looking for object contents. (Similar to the 'loose-objects' job, there are some Git commands that open pack-files regardless of the multi-pack-index, but they are rarely used. Further, a user that self-selects to use background operations would likely refrain from using those commands.)

  3. 'git multi-pack-index repack --bacth-size=<size>(man)' collects a set of pack-files that are listed in the multi-pack-index and creates a new pack-file containing the objects whose offsets are listed by the multi-pack-index to be in those objects. The set of pack- files is selected greedily by sorting the pack-files by modified time and adding a pack-file to the set if its "expected size" is smaller than the batch size until the total expected size of the selected pack-files is at least the batch size. The "expected size" is calculated by taking the size of the pack-file divided by the number of objects in the pack-file and multiplied by the number of objects from the multi-pack-index with offset in that pack-file. The expected size approximates how much data from that pack-file will contribute to the resulting pack-file size. The intention is that the resulting pack-file will be close in size to the provided batch size.

    The next run of the incremental-repack task will delete these repacked pack-files during the 'expire' step.

    In this version, the batch size is set to "0" which ignores the size restrictions when selecting the pack-files. It instead selects all pack-files and repacks all packed objects into a single pack-file. This will be updated in the next change, but it requires doing some calculations that are better isolated to a separate change.

These steps are based on a similar background maintenance step in Scalar (and VFS for Git). This was incredibly effective for users of the Windows OS repository. After using the same VFS for Git repository for over a year, some users had thousands of pack-files that combined to up to 250 GB of data. We noticed a few users were running into the open file descriptor limits (due in part to a bug in the multi-pack-index fixed by af96fe3 ("midx: add packs to packed_git linked list", 2019-04-29, Git v2.22.0-rc1 -- merge).

These pack-files were mostly small since they contained the commits and trees that were pushed to the origin in a given hour. The GVFS protocol includes a "prefetch" step that asks for pre-computed pack-files containing commits and trees by timestamp. These pack-files were grouped into "daily" pack-files once a day for up to 30 days. If a user did not request prefetch packs for over 30 days, then they would get the entire history of commits and trees in a new, large pack-file. This led to a large number of pack-files that had poor delta compression.

By running this pack-file maintenance step once per day, these repos with thousands of packs spanning 200+ GB dropped to dozens of pack- files spanning 30-50 GB. This was done all without removing objects from the system and using a constant batch size of two gigabytes. Once the work was done to reduce the pack-files to small sizes, the batch size of two gigabytes means that not every run triggers a repack operation, so the following run will not expire a pack-file. This has kept these repos in a "clean" state.

git maintenance now includes in its man page:

incremental-repack

The incremental-repack job repacks the object directory using the multi-pack-index feature. In order to prevent race conditions with concurrent Git commands, it follows a two-step process. First, it calls git multi-pack-index expire to delete pack-files unreferenced by the multi-pack-index file. Second, it calls git multi-pack-index repack to select several small pack-files and repack them into a bigger one, and then update the multi-pack-index entries that refer to the small pack-files to refer to the new pack-file. This prepares those small pack-files for deletion upon the next run of git multi-pack-index expire. The selection of the small pack-files is such that the expected size of the big pack-file is at least the batch size; see the --batch-size option for the repack subcommand in git multi-pack-index. The default batch-size is zero, which is a special case that attempts to repack all pack-files into a single pack-file.

And:

maintenance: add incremental-repack auto condition

Signed-off-by: Derrick Stolee

The incremental-repack task updates the multi-pack-index by deleting pack-files that have been replaced with new packs, then repacking a batch of small pack-files into a larger pack-file. This incremental repack is faster than rewriting all object data, but is slower than some other maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many pack-files should exist outside of the multi-pack-index before running the step.
These pack-files could be created by 'git fetch(man)' commands or by the loose-objects task.
The default value is 10.

Setting the option to zero disables the task with the '--auto' option, and a negative value makes the task run every time.

git config now includes in its man page:

maintenance.incremental-repack.auto

This integer config option controls how often the incremental-repack task should be run as part of git maintenance run --auto. If zero, then the incremental-repack task will not run with the --auto option. A negative value will force the task to run every time. Otherwise, a positive value implies the command should run when the number of pack-files not in the multi-pack-index is at least the value of maintenance.incremental-repack.auto. The default value is 10.

这篇关于我怎么知道`git gc --auto`是否做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆