在一段时间间隔内迭代git commit历史记录时,git慢会出现慢速列表 [英] Slow git rev-list when iterating git commit history over time intervals

查看:38
本文介绍了在一段时间间隔内迭代git commit历史记录时,git慢会出现慢速列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个使用 git rev-list -n1 --before = X 的工具,以固定的时间间隔来迭代Git提交历史记录,这样我就可以看到每年,每月的最新修订等

问题在于, rev-list 在每次通话中都会启动新的修订版本,并且父亲回去花费的时间更长.这是来自Linux内核源代码的一些示例.

  $ time git rev-list -n1 --before ="2016年1月1日" HEADa889331d759453fa7f424330f75ae4e2b9e02db4真正的0m1.395s用户0m1.367ssys 0分0.024秒$ time git rev-list -n1 --before ="2015年1月1日" HEAD5b5e76218fbdbb71a01d5480f289ead624232876真正的0m2.349s用户0m2.306ssys 0m0.036s$ time git rev-list -n1 --before ="2005年1月1日" HEAD真正的0m5.556s用户0m5.435ssys 0m0.105s 

如果我想在N个递减的日期中循环调用 rev-list ,则该循环运行N次遍历,每次迭代花费的时间更长.文档讨论了位图和对象遍历策略以加快历史记录,但是我在理解它们时遇到了麻烦.我尝试了 git repack -ab ,然后尝试了 git rev-list --use-bitmap-index ,但这并没有改善结果.

我唯一的要求是,给定HEAD的任何位置,我可以准确地确定在给之前的日期之前出现的第一个修订版本,在之前,如果需要,可以遵循祖先的路径.

在这种情况下,使 rev-list 更快的最佳方法是什么?

解决方案

重复扫描列表以选择连续元素是O(N ^ 2).扫描的效率有多大无关紧要,N ^ 2会咬人.

生成一个带有提交ID和日期的列表,然后剥离不想要的内容,并从选定的Sha生成真实的日志消息.那总共是三关,不是N.

  git log --first-parent --pretty =%H \%cd --date = short \|awk'$ 2 $ 3!= last {last = $ 2 $ 3;打印$ 1}''FS = [-]'\|git log --no-walk --stdin 

在Linux存储库上花了15秒的冷缓存时间,其中有一个细小东西的硬盘,列出了147个提交.重新运行不到一秒钟.

edit:以-date-order -first-parent 潜入,以考虑所有路径花费了25.1秒的冷高速缓存,7.9秒的热高速缓存,从而列出了782提交.

I wrote a tool that uses git rev-list -n1 --before=X to iterate a Git commit history using fixed time intervals, so that I see the last revision for every year, month, etc.

The problem is that rev-list kicks off a new revision walk on every call, and it takes longer the father back I go. Here are some samples from the Linux kernel source.

$ time git rev-list -n1 --before="Jan 1 2016" HEAD
a889331d759453fa7f424330f75ae4e2b9e02db4

real    0m1.395s
user    0m1.367s
sys 0m0.024s

$ time git rev-list -n1 --before="Jan 1 2015" HEAD
5b5e76218fbdbb71a01d5480f289ead624232876

real    0m2.349s
user    0m2.306s
sys 0m0.036s

$ time git rev-list -n1 --before="Jan 1 2005" HEAD

real    0m5.556s
user    0m5.435s
sys 0m0.105s

If I want to call rev-list in a loop over N decreasing dates, that loop runs N walks that take longer on every iteration. The docs talk about bit-maps and object traversal strategies to speed up the history, but I am having trouble understanding them. I tried git repack -ab followed by git rev-list --use-bitmap-index, but that didn't improve results.

My only requirement is that given any position for HEAD, I can accurately pinpoint the first revision that appears before the date given to --before, following paths to ancestors if needed.

What is the best way to make rev-list faster for this use case?

解决方案

Repeatedly scanning a list to select successive elements is O(N^2). Doesn't much matter how efficient the scan is, that N^2 is going to bite.

Generate one list with commit id and date, then strip what you don't want and generate your real log messages from the selected sha's. That's three passes total, not N.

git log --first-parent --pretty=%H\ %cd --date=short \
| awk '$2$3 != last { last=$2$3; print $1}' 'FS=[- ]' \
| git log --no-walk --stdin

That took fifteen seconds cold-cache on the linux repo, with a spinny-things hdd, listing 147 commits. A rerun took less than a second.

edit: subbing in --date-order for --first-parent to consider all paths took 25.1 seconds cold-cache, 7.9 seconds hot, to list 782 commits.

这篇关于在一段时间间隔内迭代git commit历史记录时,git慢会出现慢速列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆